Conquering All Your Stream Processing Needs with Kafka and Spark

Conquering All Your Stream Processing Needs with Kafka and Spark

On-demand recording

Kafka Summit 2016 | Systems Track

Apache Spark, specifically Spark Streaming, is becoming one of the most widely used stream processing system for Kafka. At its heart, Spark is an extremely fast and general-purpose distributed data processing platform. This allows the unification of all kinds of data processing using a single framework – streaming, SQL, and machine learning. For Kafka users, this means that they can use Spark to run batch jobs, streaming pipelines as well as interactive queries on Kafka data. In this talk, I am going to give a brief overview of the Spark framework and elaborate on how different components of Spark can be used to process data from Kafka. Specifically, I am going to cover the following.

  • Real-time processing of Kafka streams with Spark Streaming
  • Batch and interactive querying of Kafka data with Spark and Spark SQL
  • Schema-aware streaming ETL from with Streaming DataFrames


Tathagata Das, Software Engineer, Databricks

We use cookies to understand how you use our site and to improve your experience. Click here to learn more or change your cookie settings. By continuing to browse, you agree to our use of cookies.