In this talk, we'll discuss how VillageMD is able to use Kafka topic compaction for rapidly scaling our reprocessing pipelines to encompass hundreds of feeds. Within healthcare data ecosystems, privacy and data minimalism are key design priorities. Being able to handle data deletion in a reliable, timely manner within event-driven architectures is becoming more and more necessary with key governance frameworks like the GDPR and HIPAA.
We'll be giving an overview of the building and governance of dead-letter queues for streaming data processing.
We'll discuss:
- How to architect a data sink for failed records.
- How topic compaction can reduce duplicate data and enable idempotency.
- Building a tombstoning system for removing successfully reprocessed records from the queues.
- Considerations for monitoring a reprocessing system in production -- what metrics, dataops, and SLAs are useful?
Presenter
Matthew Zhou
Matthew Zhou is a senior data engineer at Peloton, where he specializes in data governance and infrastructure, and a tech policy fellow at the Aspen Institute. Previously, Matthew was an engineering manager at VillageMD and was a data engineer at the New York Times. Matthew graduated from Columbia University with an MPH in healthcare informatics, and also has a degree in anthropology from Northwestern University. In his spare time, he loves trying out woodworking projects, taking long cycling rides, and getting lost in a really good sci-fi book!