Live Demo: Build Scalable Event-Driven Microservices with Confluent | Register Now

Presentation

Build Real-time Machine Learning Apps on Generative AI with Kafka Streams

« Current 2023

The rise of large language models like GPT-4 and generative AI has changed the traditional approach of data and ML teams to engineer their own features and train models in batch on large sets of historic data. Since training of proprietary LLMs is not anymore an affordable option to most organizations, the inference of foundational models via APIs seems to be the natural way of consumption, providing new challenges for the architecture of real-time apps and workflows.

In this talk we show how data and machine learning teams can rapidly prototype and deploy real-time ML apps, ingesting real-time data with the help of Apache Kafka® and Airy, an open-source app framework. We will discuss different options to finetune LLMs and „chaining“ them with other ML models at inference in a microservices architecture utilizing Kafka Streams and Kubernetes. We will also discuss how streaming can enable dynamic features for ML models and prompt engineering to integrate with generative AI.

At the end of the talk we will give an outlook on the opportunity to dynamically retrain machine learning models in real-time with streaming and batch sources, utilizing Ray and Kubernetes to spin up GPU node pools for model training on demand. In this context, we will also discuss how event streaming can be used for reinforcement learning with human feedback (RLHF) to improve the accuracy of predictions and to make the ML model more robust over time.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how