OSS Kafka couldn’t save them. See how data streaming came to the rescue! | Watch now

Presentation

Getting Started With Spark Structured Streaming

« Current 2022

Many data pipelines still default to processing data nightly or hourly, but information is created all the time and should be available much sooner. While the move to stream processing adds complexity, Spark Structured Streaming makes it achievable for teams of any size to switch to streaming.

This session shares techniques for data engineers who are new to building streaming pipelines with Spark Structured Streaming. It covers how to implement real-time stream processes with Apache Spark and Apache Kafka. We will discuss general concepts for Spark Structured Streaming along with introductory code examples. We will also look at important streaming concepts like triggers, windows, and state. To connect it all we will walk through a complete pipeline, including a demo using PySpark, Apache Kafka, and Delta Lake tables.

Presenter

Dustin Vannoy

Dustin Vannoy Consulting

Dustin Vannoy is a data engineering consultant based in San Diego. He currently focuses on building data platforms and pipelines in Azure with Apache Spark, Kafka, Python, and Scala. He is co-founder of the Data Engineering San Diego meetup and encourages others to grow their data skills by making tutorials, mentoring others, and speaking at events.

Getting Started With Spark Structured Streaming

Presenter

Dustin Vannoy

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how