Découvrez de nombreuses nouveautés à l'occasion de notre lancement du 2e trimestre 2023 : moteur Kora Engine, règles de qualité des données, et bien plus encore | S'inscrire pour une démo
This talk describes our journey of ingesting multiple Kafka data streams from thousands of topics and about half a million partitions, storing Apache Iceberg datasets and explaining the issues along the way. We will take a look at CDC streams produced from our MySQL databases by Debezium, how we decided to process and store the data, and how our data teams now access the information. Join us on a whirlwind tour through Kafka Connect, Avro schemas, Iceberg tables, table evolutions, breaking schema changes, recurring exceptions, fun bugs, and why timestamps are hard. Finally, we will discuss some of the solutions these datasets have enabled for us and how the data is now used.