Running production CDC ingestion pipelines at scale in Robinhood

« Current 2022

Robinhood’s mission is to democratize finance for all. Data driven decision making is key to achieving this goal. Data needed are hosted in various OLTP databases. Replicating this data near real time in a reliable fashion to data lakehouse powers many critical use cases for the company. In Robinhood, CDC is not only used for ingestion to data-lake but is also being adopted for inter-system message exchanges between different online micro services. .

In this talk, we will describe the evolution of change data capture based ingestion in Robinhood not only in terms of the scale of data stored and queries made, but also the use cases that it supports. We will go in-depth into the CDC architecture built around our Kafka ecosystem using open source system Debezium and Apache Hudi. We will cover online inter-system message exchange use-cases along with our experience running this service at scale in Robinhood along with lessons learned.

Moderator

Balaji Varadarajan

Robinhood

Balaji Varadarajan is a Sr.Staff Engineer at Robinhood where he broadly oversees Robinhood’s data lake. He is also an Apache Hudi PMC member. Previously, he was a tech lead in Uber data ingestion team and one of the lead engineers on LinkedIn’s databus change capture system. Balaji’s interests lie in distributed data systems.

Moderator

Pritam K Dey

Robinhood

Running production CDC ingestion pipelines at scale in Robinhood

Moderator

Balaji Varadarajan

Moderator

Pritam K Dey

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how