Live Demo: Build Scalable Event-Driven Microservices with Confluent | Register Now

Presentation

10 tips for enabling data discovery and governance in your organization

« Current 2023

Discovery is the first barrier to using data. As data platforms and systems scale so does the ability of stakeholders to create more and more data. More data means things are harder to find. Data products need to be cataloged on the go not only for discovery but also for governance purposes. Not only that, data can exist in many forms - reports, tables, files, streams, services, logs and may go through multiple hops of processing by multiple teams before it becomes a curated data product. Furthermore, a typical data organization will have a plethora of platform and infrastructure pieces - some open source, some cloud based and some custom. To build a robust discovery ecosystem, cataloging must happen continuously, at each hop and for every component in the organization. This becomes challenging without a central team overseeing the entire process.

In this presentation, I will talk about how we solved the problem of cataloging and discovery using Datahub as our discovery platform. I will cover the details of how we went about ingesting metadata from a plethora of infrastructure and platform components(such as Snowflake, Looker, Terraform, Airflow, Kinesis, custom declarative configs etc) that are involved in a typical data product lifecycle at Chime. I will also talk about the processes and design principles we used to make cataloging and data governance a part of our dna.

Presenter

Sherin Thomas

Chime

Sherin is a Software Engineer with over 12 years of experience at companies like Google, Twitter, Lyft, Netflix, Chime. She works in the field of Big Data, Streaming, ML/AI and Distributed Systems. Currently, she's building a shiny new data platform at Chime. Sherin has presented on the topic of ML and Streaming at various reputable conferences including a keynote address and has judged various awards such as SXSW Innovation awards and CES.

Recently she advised NASA's SpaceML program and helped build a platform for processing petabytes of satellite imagery for detecting weather patterns and labelling raw data for climate science related AI research. She also writes a blog where she shares her thoughts on technology, work and career.

When she's not technical stuff she enjoys painting, reading, perusing the art and fashion section of New York Times and spending time with her husband and toddler.

10 tips for enabling data discovery and governance in your organization

Presenter

Sherin Thomas

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how