New in Confluent Cloud: Making Data & Pipelines Accessible for AI-Ready Streaming | Learn More

Presentation

Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning without a Data Lake

« Kafka Summit 2020

Machine Learning (ML) is separated into model training and model inference. ML frameworks typically use a data lake like HDFS or S3 to process historical data and train analytic models. But it’s possible to completely avoid such a data store, using a modern streaming architecture.

This talk compares a modern streaming architecture to traditional batch and big data alternatives and explains benefits like the simplified architecture, the ability of reprocessing events in the same order for training different models, and the possibility to build a scalable, mission-critical ML architecture for real time predictions with muss less headaches and problems.

The talk explains how this can be achieved leveraging Apache Kafka, Tiered Storage and TensorFlow.

Presenter

Kai Waehner

Confluent

Kai is Global Field CTO at Confluent. His areas of expertise include big data analytics, machine learning, messaging, integration, microservices, the Internet of Things, stream processing and blockchain. He is also the author of technical articles, gives talks at international conferences and shares his experiences of new technologies in his blog (www.kai-waehner.de/blog).

Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning without a Data Lake

Presenter

Kai Waehner

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how