Many data pipelines still default to processing data nightly or hourly, but information is created all the time and should be available much sooner. While the move to stream processing adds complexity, Spark Structured Streaming makes it achievable for teams of any size to switch to streaming.
This session shares techniques for data engineers who are new to building streaming pipelines with Spark Structured Streaming. It covers how to implement real-time stream processes with Apache Spark and Apache Kafka. We will discuss general concepts for Spark Structured Streaming along with introductory code examples. We will also look at important streaming concepts like triggers, windows, and state. To connect it all we will walk through a complete pipeline, including a demo using PySpark, Apache Kafka, and Delta Lake tables.