Apache Spark, specifically Spark Streaming, is becoming one of the most widely used stream processing system for Kafka. At its heart, Spark is an extremely fast and general-purpose distributed data processing platform. This allows the unification of all kinds of data processing using a single framework – streaming, SQL, and machine learning. For Kafka users, this means that they can use Spark to run batch jobs, streaming pipelines as well as interactive queries on Kafka data. In this talk, I am going to give a brief overview of the Spark framework and elaborate on how different components of Spark can be used to process data from Kafka. Specifically, I am going to cover the following.
Tathagata Das, Software Engineer, Databricks