We’ll start the talk with a live, interactive demo generating audience-specific recommendations using NiFi, Kafka, Spark Streaming, SQL, ML, and GraphX.
Next, we’ll dive deep into the data flow between each of the key components. We use NiFi to track all data transformations using its “data provenance” capabilities. We use Kafka for its high-throughput and high-availability features. And we use Spark Streaming to dynamically train our ML models in real-time using an incremental Streaming Matrix Factorization library from Databricks.
Lastly, we’ll discuss the latest Netflix Recommendations Pipeline including their highly-scalable Netflix Open Source components for ML model serving.
We’re going to cover a lot of material in a short amount of time. All of the slides, demos, and Docker images will be available at advancedspark.com.
Chris Fregly, Principal Data Solutions Engineer, IBM Spark Technology Center