Many companies face the challenge of gathering data from disparate sources, organizing these data into a knowledge base, and serving up this information in a relevant way to their customers. With over 19 billion digitized historical records, 80 million family trees, 10 billion people/tree nodes, and three million DNA customers, Ancestry has a trove of data that helps people gain a greater sense of who they are, where they came from, and who else shares their family’s journey.
Companies with big data challenges often integrate Apache Kafka into their data architecture; yet despite successful Kafka integration, dealing with massive quantities of data can still be overwhelming and debilitating. How can you make sense of all the data from various sources in your warehouse, meet the needs of a growing business, and remain competitive? One of the ways Ancestry tackled this challenge was by introducing the schema registry into the heart of the data fabric and development process. The results have been transformational and dramatically reduced the time it takes from data source definition to reporting and production. Pre-schema registry, new data sets could take months to implement and then get lost in the petabytes of data.
Join this session and learn how Ancestry brought new data sets into production in not just months, or weeks, but DAYS and how the schema registry transformed our data fabric.