Don’t miss out on Current in New Orleans, October 29-30th — save 30% with code PRM-WEB | Register today

Presentation

Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verification

« Kafka Summit London 2024

Incorrect data produced into Kafka can be a poison pill that has the potential to disrupt businesses built upon Kafka. The “Semantic Validation” feature is designed to address the challenges posed by incorrect or unexpected data in Kafka’s data processing pipelines, with the goal of mitigating such disruptions. By allowing users to define robust field constraints directly within schemas, such as Avro, we aim to enhance data quality and minimize the downstream impacts of inaccurate data in Kafka.

Furthermore, this feature can be expanded to include offline data processing, in addition to Kafka and Flink real-time processing. By combining real-time processing, batch analytics, and AI data pipelines, a global semantic validation system can be built.

In our upcoming talk, we will delve into the use cases of this feature, discuss its architecture, provide examples of defining rules, and explain how we enforce these rules. Ultimately, we will demonstrate how this feature can significantly enhance reliability and trustworthiness in Uber’s data processing pipelines.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how