A streaming application is started once and then continuously ingests endless, fairly steady streams of events. That's as far as the theory goes.
Unfortunately, reality is more complicated. Over time your application's ability to process large historical data sets robustly, efficiently and correctly will be critical:
- for exploratory data analysis during development
- for bootstrapping the initial state of an application
- for back-filling following an outage or bugfix
- for keeping up with bursty input streams
These scenarios call for batch processing techniques. Apache Flink is as streaming-first as it gets. Yet over the last releases, the community has invested significant resources into unifying stream- and batch processing on all layers of the stack: scheduler to APIs.
In this talk, I'll introduce Apache Flink's approach to unified stream and batch processing and discuss - by example - how these scenarios can already be addressed today and what might be possible in the future.
プレゼンター
Konstantin Knauf
ConfluentKonstantin is a member of the Apache Flink PMC, long-term contributor to the project and group product manager at Confluent. He joined the company early this year as part of the acquisition of Immerok which he had co-founded with a group of long-term community members earlier last year. Formerly, as Head of Product at Ververica, Konstantin supported multiple teams working on Apache Flink in both discovery as well as delivery. Before that he was leading the pre-sales team at Ververica, helping their clients as well as the Open Source Community to get the most out of Apache Flink.