Ahorra un 25 % (o incluso más) en tus costes de Kafka | Acepta el reto del ahorro con Kafka de Confluent

Building CDC Pipelines With Apache Flink®

Q: How does a streaming approach improve on batch ELT/ETL pipelines?

A streaming approach allows you to 'shift left,' processing and governing data closer to the source. Instead of running separate, costly ELT jobs in multiple downstream systems, you process the data once in-stream with Flink to create a single, reusable, high-quality data product. This improves data quality, reduces overall processing costs and risks, and gets trustworthy data to your teams faster.

Change data capture (CDC) is used to copy data across relational databases, enabling essential backend operations like data synchronization, migration, and disaster recovery. And now, with stream processing, you can build CDC pipelines that power event-driven applications and trusted data products, with fresh, processed data integrated across legacy and modern, distributed systems.

See how Confluent brings Apache Kafka® and Apache Flink® together so you can build streaming CDC pipelines and power downstream analytics with fresh, high-quality operational data.

Explore Demos on GitHub Try Free on Confluent

Go from making decisions on stale data to reacting in real time

Reduce your processing costs by 30%

Deliver trusted, clean data without manual break-fix work

3 Key Challenges
With Traditional CDC Architectures

Most organizations already use log-based CDC to turn database changes into events.

Significant data latency with batch processing. Instead of event streams, most organizations rely on batch processing to materialize log data downstream. That means data systems remain out of sync for the hours or days it takes for the next batch process to run.
Costs of redundant processing. The additional costs come from both having to build and maintain point-to-point integrations and the redundant processing that occurs across these pipelines.
Loss of trust due to constant manual break-fix cycles. Maintaining correctness across all these pipelines is time-consuming and prone to human error. And this approach forces teams to reactively fix problems only identified once they affect another downstream consumer.

A Simple Architecture for Getting Fresh, Clean Data

Building CDC pipelines with Kafka and Flink lets you unify your CDC workloads and batch analytics and eliminate processing silos. Instead of waiting on batch processing, taking on the costs for redundant processing, or relying on fragile pipelines, this architecture allows you to:

Capture CDC data as event streams
Use Flink to process those streams in real time
Instantly materialize CDC streams across your operational and analytics estates

Learn more about Flink See Flink demos

Maximize Data Value at a Fraction of the Cost

With serverless Apache Flink® on the Confluent data streaming platform, you can shift processing left—before data ingestion—to improve latency, data portability, and cost-effectiveness.

Data enrichment: Enhance data with additional context for improved accuracy.
Data reusability: Share consistent data streams across applications.
Real-time everywhere: Enable low-latency applications to respond to events instantly.
Lower costs: Optimize resource use and reduce redundant processing.

AppDev teams can build data pipelines that unlock timely action

Whether you need shift-left data warehouse and data lake ingestion for analytics, real-time search index building, ML pipelines, and SIEM optimization.

Learn Shift Left Analytics

Analytics teams can prep and shape data to feed event-driven applications by triggering computations, state updates, or external actions

This includes applications built for GenAI solutions, fraud detection, real-time alerting and notifications, marketing personalization, and more.

Discover Streaming Agents

3 Steps to Build CDC Pipelines With Confluent Stream Processing

With Confluent, you can process your CDC streams before you materialize them in your analytics estate. Simply filter, join, and enrich change data captured in your Kafka topics with Flink SQ. Then materialize data streams within both your operational and analytics estate.

Step 1. Easily Capture and Integrate Change Data With Fully Managed CDC Connectors

Confluent offers fully managed CDC connectors for sources like Oracle, SQL Server, MySQL, Salesforce, and Debezium. These connectors allow you to capture an ordered history of all inserts, updates, and deletes—enabling real-time, fine-grained change data integration—without having to write, test, maintain, or manage connectors yourself.

Create continuously updating tables that reflect real-time changes from underlying data streams, enabling live, queryable results. And you can even automatically interpret Debezium CDC streams to simplify converting raw database changes into continuously updating tables.

Read blog Find a CDC Connector

Step 2. Continuously Enrich, Transform and Optimize Streaming Change Data With Flink

Wherever your change data originates, use Confluent Cloud for Apache Flink® to perform in-memory stateless and stateful processing using simple SQL syntax. Its low-latency, high-throughput stream processing capabilities equip you and downstream teams to enrich and transform CDC workloads in flight with filtering, deduplication, aggregation, joining, and denormalization.

Get Started with Flink Take the Flink Course

Step 3. Instantly Expose CDC Streams Anywhere

ETL pipelines are a fragile and expensive way to deliver operational workloads to your analytics estate. It becomes even harder when the clean, enriched data needs to be made available in your operational estate. Additional hops and scheduled batch processing could significantly impact the ability to deliver real-time, event-driven experiences.

With Confluent, you can easily integrate these systems, process CDC workloads in flight, and instantly materialize across both analytical and operational apps. Confluent’s rich set of managed connectors enable seamless integration, while Tableflow materializes your streaming data in open table formats in real-time.

Read blog Build Faster With Connectors

Streaming CDC Demos & Case Studies

Confluent customers are using Flink to enhance existing CDC use cases like data synchronization and disaster recovery and unlock new real-time capabilities.

Explore the GitHub repo to learn how to implement real-time analytics for customer 360 and product sales analysis, or sales trend analysis use cases.

You’ll have 2 labs to choose from:

Product Sales and Customer360 Aggregation Lab

Clean, and aggregate product sales data, ingest the enriched data to Snowflake or Redshift, and then create a data product for operational databases to consume.

Start Now

Daily Sales Trends Lab

Validate payments, analyze sales patterns to identify daily trends, then materialize the Kafka topic as an Iceberg table in Amazon Athena for deeper insights.

Start Now

“Adopting CDC has allowed us to unleash the power of real-time data and ultimately migrate away from batch data workloads to stream processing.”

Read Blog

“With Flink, we now have the opportunity to shift left and do a lot of early data transformations and computation on our data before it reaches Snowflake. This will optimize our data processing costs to increase the amount of data we have available.”

Vitaly Shoykhet

Senior VP of Engineering, Audacy

Read Customer Story

“With Confluent, we can now easily build the CDC pipelines we need to acquire data in real time rather than retrieving it in batches every 10 minutes, enabling us to detect fraud quickly.”

Ryohei Nagao

Trust & Safety Engineering, Mercari

Read Customer Story

“The most difficult thing was we didn’t have enough internal resources to develop CDC and the streaming process. Now, we can easily build CDC systems…the developer team was able to decrease their workload while developing the streaming process.”

Trần Thế Chinh

Head of System Platform, One Mount Group

Read Customer Story

“With Confluent Cloud, we can now provide operational data in real time to any team that needs it. This is really powerful and significantly reduces our operational burden.”

Siegfried Polysius

VP Cloud & Architecture, BestSecret

Read Customer Story

How to Get Started With Streaming CDC on Confluent

Ready to start processing CDC data in real time with Flink? Get started on Confluent and implement a stream processing architecture ready for any environment.

Try Confluent Cloud for Apache Flink®—available on AWS, Google Cloud, Microsoft Azure—to build applications leveraging Kafka + Flink with serverless, cloud-native cost efficiency and simplicity.

And with Confluent Platform for Apache Flink®, you can bring your existing Flink workloads to a self-managed data streaming platform, ready to deploy on-premises or in your private cloud.

CloudSelf Managed

Confluent Cloud

Servicio cloud-native totalmente gestionado para Apache Kafka®

He leído y acepto los Términos del servicio.

Quiero recibir correos electrónicos sobre productos, servicios y eventos de Confluent.

Atención al cliente

Tengo una cuenta. Iniciar sesión

Al hacer clic en «EMPIEZA GRATIS», aceptas los Términos de servicio y la Política de privacidad.

He leído y acepto los Términos del servicio.

Quiero recibir correos electrónicos sobre productos, servicios y eventos de Confluent.

Atención al cliente

Tengo una cuenta. Iniciar sesión

Al hacer clic en «EMPIEZA GRATIS», aceptas los Términos de servicio y la Política de privacidad.

Streaming CDC With Flink | FAQs

How does a streaming approach improve on batch ELT/ETL pipelines?

A streaming approach allows you to "shift left," processing and governing data closer to the source. Instead of running separate, costly ELT jobs in multiple downstream systems, you process the data once in-stream with Flink to create a single, reusable, high-quality data product. This improves data quality, reduces overall processing costs and risks, and gets trustworthy data to your teams faster.

Why use Apache Flink® for processing real-time CDC Data?

Apache Flink® is the de facto standard for stateful stream processing, designed for high-performance, low-latency workloads—making it ideal for CDC. Its ability to handle stateful computations allows it to accurately interpret streams of inserts, updates, and deletes to maintain a correct, materialized view of data over time. Confluent offers a fully managed, serverless Flink service that removes the operational burden of self-management.

How do you handle data consistency and quality in real-time CDC pipeline?

Data consistency is maintained by processing CDC events in-flight to filter duplicates, join streams for enrichment, and aggregate data correctly before it reaches any downstream system. Confluent's platform integrates Flink with Stream Governance, including Schema Registry, to define and enforce universal data standards, ensuring data compatibility, quality, and lineage tracking across your organization.

How does Confluent Cloud handle changes to the source database schema?

When your CDC pipeline is integrated with Confluent Schema Registry, it can automatically and safely handle schema evolution. This ensures that changes to the source table structure—like adding or removing columns—do not break downstream applications or data integrity. The platform manages schema compatibility, allowing your data streams to evolve seamlessly.

What are the main benefits of using a Fully managed service for Apache Flink® like Confluent Cloud?

A fully managed service eliminates the significant operational complexity, steep learning curve, and high in-house support costs associated with self-managing Apache Flink®. With Confluent, you get a serverless experience with elastic scalability, automated updates, and pay-as-you-go pricing, allowing your developers to focus on building applications rather than managing infrastructure. In addition, native integration between Apache Kafka® and Apache Flink® and pre-built connectors allow teams to build and scale fast.

How does Confluent Cloud simplify processing Debezium CDC events?

Confluent Cloud provides first-class support for Debezium, an open source distributed platform for change data capture. Pre-built connectors can automatically interpret the complex structure of Debezium CDC event streams, simplifying the process of integrating with Kafka and Flink.

Building CDC Pipelines With Apache Flink®

Go from making decisions on stale data to reacting in real time

Reduce your processing costs by 30%

Deliver trusted, clean data without manual break-fix work

3 Key ChallengesWith Traditional CDC Architectures

Maximize Data Value at a Fraction of the Cost

3 Steps to Build CDC Pipelines With Confluent Stream Processing

Step 1. Easily Capture and Integrate Change Data With Fully Managed CDC Connectors

Step 2. Continuously Enrich, Transform and Optimize Streaming Change Data With Flink

Step 3. Instantly Expose CDC Streams Anywhere

Streaming CDC Demos & Case Studies

Confluent Cloud

Servicio cloud-native totalmente gestionado para Apache Kafka®

Streaming CDC With Flink | FAQs

How does a streaming approach improve on batch ELT/ETL pipelines?

Why use Apache Flink® for processing real-time CDC Data?

How do you handle data consistency and quality in real-time CDC pipeline?

How does Confluent Cloud handle changes to the source database schema?

What are the main benefits of using a Fully managed service for Apache Flink® like Confluent Cloud?

How does Confluent Cloud simplify processing Debezium CDC events?

3 Key Challenges
With Traditional CDC Architectures