Robinhood’s mission is to democratize finance for all. Data driven decision making is key to achieving this goal. Data needed are hosted in various OLTP databases. Replicating this data near real time in a reliable fashion to data lakehouse powers many critical use cases for the company. In Robinhood, CDC is not only used for ingestion to data-lake but is also being adopted for inter-system message exchanges between different online micro services. .
In this talk, we will describe the evolution of change data capture based ingestion in Robinhood not only in terms of the scale of data stored and queries made, but also the use cases that it supports. We will go in-depth into the CDC architecture built around our Kafka ecosystem using open source system Debezium and Apache Hudi. We will cover online inter-system message exchange use-cases along with our experience running this service at scale in Robinhood along with lessons learned.