Live Demo: Build Scalable Event-Driven Microservices with Confluent | Register Now
The key question to answer: given a Flink job definition of a real-time feature, the historical Kafka data and an exact point in time in history, can we confidently say what the feature value would have been in Redis?
In this talk, I will demonstrate that it is impossible to accurately answer this question. Approaches such as Flink's batch mode are not able to accurately handle late events and indicate when in processing time a window closed. In fact, they practically guarantee feature leakage.
The key reason behind the challenge is that historical Kafka records only contains Kafka producer timestamps (event time), and does not contain the processing time. This makes it extremely challenging to say with certainty what records would have made it into a particular window, and when exactly in processing time an event time window fires based on watermarks.
I will then talk about assumptions we can make to get "good enough" backfill results that try to avoid feature leakage. The best approach likely depends on your particular Kafka/Flink setup.