Kora Engine, Data Quality Rules und mehr in unserem Q2 2023 Launch | Für die Demo registrieren
This talk describes our journey of ingesting multiple Kafka data streams from thousands of topics and about half a million partitions, storing Apache Iceberg datasets and explaining the issues along the way. We will take a look at CDC streams produced from our MySQL databases by Debezium, how we decided to process and store the data, and how our data teams now access the information. Join us on a whirlwind tour through Kafka Connect, Avro schemas, Iceberg tables, table evolutions, breaking schema changes, recurring exceptions, fun bugs, and why timestamps are hard. Finally, we will discuss some of the solutions these datasets have enabled for us and how the data is now used.