Fundamentals of Stream Processing with Apache Beam

Kafka Summit 2016 | Systems Track


Tyler Akidau

Tyler Akidau, Kafka Summit Program Committee, Software Engineer, Google

Frances Perry

Frances Perry, Software Engineer, Google


Apache Beam (unified Batch and strEAM processing!) is a new Apache incubator project. Originally based on years of experience developing Big Data infrastructure within Google (such as MapReduce, FlumeJava, and MillWheel), it has now been donated to the OSS community at large.

Come learn about the fundamentals of out-of-order stream processing, and how Beam’s powerful tools for reasoning about time greatly simplify this complex task. Beam provides a model that allows developers to focus on the four important questions that must be answered by any stream processing pipeline:

  • What results are being calculated?
  • Where in event time are they calculated?
  • When in processing time are they materialized?
  • How do refinements of results relate?

Furthermore, by cleanly separating these questions from runtime characteristics, Beam programs become portable across multiple runtime environments, both proprietary (e.g., Google Cloud Dataflow) and open-source (e.g., Flink, Spark, et al).

Kafka Summit 2016