New in Confluent Cloud: Making Data & Pipelines Accessible for AI-Ready Streaming | Learn More

Presentation

Conquering All Your Stream Processing Needs with Kafka and Spark

« Kafka Summit San Francisco 2016

Kafka Summit 2016 | Systems Track

Apache Spark, specifically Spark Streaming, is becoming one of the most widely used stream processing system for Kafka. At its heart, Spark is an extremely fast and general-purpose distributed data processing platform. This allows the unification of all kinds of data processing using a single framework – streaming, SQL, and machine learning. For Kafka users, this means that they can use Spark to run batch jobs, streaming pipelines as well as interactive queries on Kafka data. In this talk, I am going to give a brief overview of the Spark framework and elaborate on how different components of Spark can be used to process data from Kafka. Specifically, I am going to cover the following.

Real-time processing of Kafka streams with Spark Streaming
Batch and interactive querying of Kafka data with Spark and Spark SQL
Schema-aware streaming ETL from with Streaming DataFrames

Presenter

Tathagata Das

Databricks

Conquering All Your Stream Processing Needs with Kafka and Spark

Kafka Summit 2016 | Systems Track

Presenter

Tathagata Das

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how