Live Demo: Build Scalable Event-Driven Microservices with Confluent | Register Now

Presentation

10 tips for enabling data discovery and governance in your organization

« Current 2023

Discovery is the first barrier to using data. As data platforms and systems scale so does the ability of stakeholders to create more and more data. More data means things are harder to find. Data products need to be cataloged on the go not only for discovery but also for governance purposes. Not only that, data can exist in many forms - reports, tables, files, streams, services, logs and may go through multiple hops of processing by multiple teams before it becomes a curated data product. Furthermore, a typical data organization will have a plethora of platform and infrastructure pieces - some open source, some cloud based and some custom. To build a robust discovery ecosystem, cataloging must happen continuously, at each hop and for every component in the organization. This becomes challenging without a central team overseeing the entire process.

In this presentation, I will talk about how we solved the problem of cataloging and discovery using Datahub as our discovery platform. I will cover the details of how we went about ingesting metadata from a plethora of infrastructure and platform components(such as Snowflake, Looker, Terraform, Airflow, Kinesis, custom declarative configs etc) that are involved in a typical data product lifecycle at Chime. I will also talk about the processes and design principles we used to make cataloging and data governance a part of our dna.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how