Don’t miss out on Current in New Orleans, October 29-30th — save 30% with code PRM-WEB | Register today
Incorrect data produced into Kafka can be a poison pill that has the potential to disrupt businesses built upon Kafka. The “Semantic Validation” feature is designed to address the challenges posed by incorrect or unexpected data in Kafka’s data processing pipelines, with the goal of mitigating such disruptions. By allowing users to define robust field constraints directly within schemas, such as Avro, we aim to enhance data quality and minimize the downstream impacts of inaccurate data in Kafka.
Furthermore, this feature can be expanded to include offline data processing, in addition to Kafka and Flink real-time processing. By combining real-time processing, batch analytics, and AI data pipelines, a global semantic validation system can be built.
In our upcoming talk, we will delve into the use cases of this feature, discuss its architecture, provide examples of defining rules, and explain how we enforce these rules. Ultimately, we will demonstrate how this feature can significantly enhance reliability and trustworthiness in Uber’s data processing pipelines.