[Webinar] How to Implement Data Contracts: A Shift Left to First-Class Data Products | Register Now

Presentation

Streaming Data Into Your Lakehouse

« Current 2022

The last years have taught us that cheap, virtually unlimited, and highly available cloud object storage doesn't make a solid enterprise data platform. Too many data lakes didn't fulfill their expectations and degenerated into sad data swamps.

With the Linux Foundation OSS project Delta Lake (https://github.com/delta-io), you can turn your data lake into the foundation of a data lakehouse that brings back ACID transactions, schema enforcement, upserts, efficient metadata handling, and time travel.

In this session, we explore how a data lakehouse works with streaming, using Apache Kafka as an example.

This talk is for data architects who are not afraid of some code and for data engineers who love open source and cloud services.

Attendees of this talk will learn:

Lakehouse architecture 101, the honest tech bits
The data lakehouse and streaming data: what's there beyond Apache Spark™ Structured Streaming?
Why the lakehouse and Apache Kafka make a great couple and what concepts you should know to get them hitched with success.
Streaming data with declarative data pipelines: In a live demo, I will show data ingestion, cleansing, and transformation based on a simulation of the Data Donation Project (DDP, https://corona-datenspende.de/science/en) built on the lakehouse with Apache Kafka, Apache Spark™, and Delta Live Tables (a fully managed service).

DDP is a scientific IoT experiment to determine COVID outbreaks in Germany by detecting elevated heart rates correlated to infections. Half a million volunteers have already decided to donate their heart rate data from their fitness trackers.

Présentateur

Frank Munz

Databricks

Dr. Frank Munz works on large-scale data and AI at Databricks. He authored three computer science books, built up technical evangelism for Amazon Web Services in Germany, Austria, and Switzerland, and once upon a time worked as a data scientist with a group that won a Nobel prize.

Frank realized his dream to speak at top-notch conferences such as Devoxx, Kubecon, and Java One on every continent (except Antarctica because it is too cold there). He holds a Ph.D. with summa cum laude in Computer Science from TU Munich. Enjoys skiing in the Alps, tapas in Spain, and exploring secret beaches in SE Asia."

Streaming Data Into Your Lakehouse

Présentateur

Frank Munz

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how