New in Confluent Cloud: Making Data & Pipelines Accessible for AI-Ready Streaming | Learn More

Presentation

Fundamentals of Stream Processing with Apache Beam

« Kafka Summit San Francisco 2016

Kafka Summit 2016 | Systems Track

Apache Beam (unified Batch and strEAM processing!) is a new Apache incubator project. Originally based on years of experience developing Big Data infrastructure within Google (such as MapReduce, FlumeJava, and MillWheel), it has now been donated to the OSS community at large.

Come learn about the fundamentals of out-of-order stream processing, and how Beam’s powerful tools for reasoning about time greatly simplify this complex task. Beam provides a model that allows developers to focus on the four important questions that must be answered by any stream processing pipeline:

What results are being calculated?
Where in event time are they calculated?
When in processing time are they materialized?
How do refinements of results relate?

Furthermore, by cleanly separating these questions from runtime characteristics, Beam programs become portable across multiple runtime environments, both proprietary (e.g., Google Cloud Dataflow) and open-source (e.g., Flink, Spark, et al).

Fundamentals of Stream Processing with Apache Beam

Kafka Summit 2016 | Systems Track

Presenter

Tyler Akidau

Presenter

Frances Perry

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how