[Webinar] How to Implement Data Contracts: A Shift Left to First-Class Data Products | Register Now

Presentation

Dynamic Change Data Capture with Flink CDC and Consistent Hashing

« Current 2023

Change Data Capture (CDC) is a popular technique for extracting data from databases in realtime. However, many CDC deployments are static: e.g. a single connector is configured to ingest data for one or several tables.

At Goldsky, we needed a way to configure CDC for a large Postgres database dynamically: the list of tables to ingest is driven by customer-facing features and is constantly changing.

We started using Flink CDC connectors built on top of the Debezium project, but we immediately faced many challenges caused mainly by the lack of incremental snapshotting.

But even after implementing incremental snapshotting ourselves, we still faced an issue around using replication slots in Postgres: we couldn't use a single connector to ingest all tables (it's just too much data), and we couldn't create a new connector for every new set of tables (we'd quickly run out of replication slots). So we needed to find a way to maintain a fixed number of replication slots for a dynamic list of tables.

In the end, we chose a consistent hashing algorithm to distribute the list of tables across multiple Flink jobs. The jobs also required some customizations to support the incremental snapshotting semantics from Flink CDC.

We learned a lot about Debezium, Flink CDC and Postgres replication, and we're excited to share our learnings with the community!

Presenter

Xiao Meng

Goldsky

Xiao Meng is a software engineer with a strong interest in data infrastructure, stream processing and SRE.

Currently, Xiao is working at Goldsky on building real-time data infrastructure for Web 3. Before that, Xiao worked as an Expert Data Engineer at Activision/Demonware, where he built a real-time game telemetry data platform for online games such as Call of Duty.

Presenter

Yaroslav Tkachenko

Goldsky

Yaroslav Tkachenko is a software engineer interested in distributed systems, microservices, data-intensive applications, modern cloud infrastructure, and DevOps practices.

Currently, Yaroslav is a Principal Software Engineer at Goldsky, focused on building a read layer for the blockchain data leveraging the power of stream-processing.

Before that, Yaroslav was a Staff Data Engineer at Shopify, working on building and supporting libraries, tools and services for Shopify's stream-processing use-cases. Previously, he was a Senior Software Engineer and later Software Architect at Activision, where he redesigned and rebuilt the data pipeline for Activision games like the Call of Duty franchise. Before joining Activision, Yaroslav held various leadership roles in multiple startups

Dynamic Change Data Capture with Flink CDC and Consistent Hashing

Presenter

Xiao Meng

Presenter

Yaroslav Tkachenko

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how