How to Build a Data Mesh with Stream Governance | Join Webinar

Kafka Summit London 2023

View sessions and slides from Kafka Summit London 2023.


Beyond Speed: Kafka Summit London 2023 Keynote

Apache Kafka® co-creator Jay Kreps and guest speakers, highlight Kafka’s past, present and future, including upcoming improvements to make open source Kafka simpler to use, easier to manage, and more reliable.

Breakout Sessions

A Beginner’s Guide to Kafka Performance in Cloud Environments

  • Steffen Hausmann, Materialize

In this session, we’ll take a look at Kafka performance from an infrastructure perspective. How does your choice of storage, compute, and networking affect cluster throughput? How can you optimize for low cost or fast recovery? When is it better to scale up rather than to scale out brokers?

A Hitchhiker's Guide to Apache Kafka®

  • Lucia Cerchie, Confluent

We'll start off by understanding the importance of events and why we'd even want to build systems with them. Taking the concept of a key-value pair to model events, we’ll explore topics, partitioning, and replication, and look at how to use the Producer and Consumer APIs.

A Kafka Client’s Request: There and Back Again

  • Danica Fine, Confluent

By the end of this session, you’ll know the ins and outs of the read and write requests that your Kafka clients make, making your next debugging or performance analysis session a breeze.

A Practical Guide To End-to-End Tracing In Event Driven Architectures

  • Roman Kolesnev, Confluent

This talk will cover the following topics:

  • Distributed tracing concepts, including context propagation and the OpenTelemetry implementation stack
  • OpenTelemetry’s Kafka instrumentation, what is supported out of the box, code examples, edge cases, challenges and solutions

An Introduction to Kafka Cruise Control

  • Viktor Somogyi-Vass, Cloudera

Join this session if you want to learn how to use Cruise Control to automate Kafka cluster management and make your team’s life easier.

Apache Kafka®: A Year in Review and A Look Forward

  • Amanda Gilbert, Confluent
  • Bruno Cadonna, Confluent

Join us as we take a look back at the last year in Kafka with members of our Apache Kafka committee. We will review some of the most influential KIPs and talk about the upcoming changes to expect in the project.

Apples and Oranges - Comparing Kafka Streams and Flink

  • Bill Bejeck, Confluent

In this talk, attendees will learn the information needed to match their event streaming requirements and objectives with the correct streaming framework. You'll leave with the knowledge of both Kafka Streams and Flink's strengths and weaknesses.

Better Integration Tests for Kafka Applications with Testcontainers

  • Oleg Šelajev, AtomicJar

In this session, we explore how Testcontainers libraries allow you programmatically create, manage the lifecycle, and configure ephemeral instances of Kafka. From spinning up individual Kafka services to creating complex cluster topologies, your tests control the environment they require and run.

Build an Event-driven Microservices with Apache Kafka & Apache Flink

  • Ali Alemi, Amazon Web Services

In this talk, learn how to decouple the communication between disparate microservices using Apache Kafka and manage the state of the events separately using Apache Flink Stateful functions.

Building Real-time Push APIs Using Kafka as the Customer Facing Interface

  • Javier Moreno Molina, Mercedes Benz Connectivity Services GmbH

The talk will be about how we have successfully used Kafka as the customer facing interface for our Push API, achieving the Best Automotive API 2022 (API:World) and our lessons learned after 2 years in production.

Case-Study: Building Real-Time Applications at Scale-Cyclist Crash Detection

  • Tomas Neubauer, Quix

Using telemetry data collected from a fitness app, we’ll demonstrate how we used a combination of Apache Kafka and Python-based microservices running on Kubernetes to build a pipeline for processing and analyzing this data in real-time.

Chill, Distill, No Overkill: Best Practices to Stress Test Kafka

  • Siva Kunapuli, Confluent

In the session, we will cover ways to:

  • Define parameters and variables before beginning
  • Accommodate for changing conditions - brokers, applications, config, network
  • Overlap infrastructure, test design, latency, and throughput

Consistent, High-throughput, Real-time Calculation Engines Using Kafka Streams

  • Kamlesh Shah, Morgan Stanley

During this session, I will cover how, at Morgan Stanley, we built a real-time, microservices based Liquidity Management platform using event streaming with Kafka Streams API, to tackle high volumes of data and to perform calculations on cross domain events, spanning wide time windows.

Dataflows for Machine Learning Operations

  • Alex Rakowski, Seldon Technologies Ltd
  • Andrei Paleyes, University of Cambridge

In this talk, we identify dataflow architectural principles to address these demands and discuss their application in an open-source ecosystem. We show how to create a decentralized dataflow engine underpinned by Kafka and the Kafka Streams client library.

Designing a Data Mesh with Kafka

  • Paul Makkar , Saxo Bank
  • Rahul Gulati, Saxo Bank

In this talk we will describe how we managed to apply Data Mesh founding principles to our operational plane, based on Kafka. Consequently, we have gained value from these principles more broadly than just analytics. An example of this is treating data as-a-product.

Don’t Let Kafka Be A Cluster: Kafka Chaos Experimentation

  • Justin Fetherolf, Verica

Attendees learn in detail how real world events were varied for the experiment, including design goals, hard trade-offs, and safety mechanisms necessary for the load tool to adhere to Chaos Engineering principles. We show how the results were analyzed to support or debunk the hypothesis.

End-to-end Streaming Between gRPC Services Via Kafka

  • John Fallows, Aklivity

This session is targeted towards developers interested in learning how to integrate gRPC with Kafka event streaming; securely, reliably and scalably.

Exactly-once Stream Processing Done Right

  • Matthias J. Sax, Confluent

In this talk, we will dive into technical details to shed some light on the above questions. We approach the topic from a conceptual point of view, explain the challenges Kafka Connect faces when it comes to exactly-once, discuss how external source and sink systems can be integrated.

Exactly-Once, Again: Adding EOS Support for Kafka Connect Source Connectors

  • Chris Egerton, Aiven

In this talk, learn how KIP-618 made exactly-once source connectors possible. Topics covered will include an overview of exactly-once support in Kafka’s client libraries, a brief refresher on the source connector API, a deep dive into some of the internal workings of Kafka Connect.

From Monolithic Orchestrator to Streaming with Microservices

  • Olivier Jauze, Michelin
  • Valerie Servaire, Michelin

This project was all about replacing a huge and complex Business Process Management tool, an orchestrator of our internal logistic flows. And when we say huge, we really mean it: more than 24 processes, 150 millions of tyres moved representing 10 billions € of Michelin turnover.

Highly Available Kafka Consumers and Kafka Streams on Kubernetes

  • Adrian McCague, Zopa Bank

In this talk I will introduce a simple consumer implementation with a default configuration and discuss the KIPs and features that have been introduced over time to limit how the hostile world of cloud computing can impact your real-time consuming applications.

How to Isolate Tenants in a Data Distribution Platform

  • Joanna Eriksson, Schibsted AS

I will walk you through how we have achieved tenant isolation in our architecture where we have isolated tenants on an architecture level through Kafka topics and on a software level through threads. We have successfully used this design for years but as with all designs, it has its limitations.

Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms

  • Yulia Antonovsky, Akamai

In this talk, we’ll walk you through how we implemented exactly-once delivery with Kafka by managing Kafka transactions the right way, and how we escaped endless rebalance storms when running hundreds of consumers on the same Kafka topic.

Implementing Real-Time Analytics with Kafka Streams

  • Ramin Gharib, bakdata GmbH

In our talk, we discuss different approaches and highlight an indexing strategy for guaranteeing the order of a range query. We will discuss the pros and cons and finally demonstrate a real-world example of our solution.

Integrating Sparkplug IoT Edge of Network Nodes with Kafka

  • Yves Kurz, PAUL Tech AG

In this session, we have a look at a real-world IoT project in which hundreds of residential building complexes are equipped with thousands of sensors and actuators that communicate via Kafka to an optimization system to reduce energy consumption and eventually help to protect our planet.

Is Pseudonymization The Answer To Your GDPR Problems?

  • Pieter van der Meer, Dataworkz

In order to have a scalable solution we are using the standard Kafka connect architecture with a sink as the base. Records that are received by this sink are pseudonymized and in the meantime, a second record is created.

Kafka Streams Rebalances and Assignments: The Whole Story

  • Alieh Saeedi, Confluent
  • John Roesler, Confluent

This talk aims to completely demystify the system from an operational perspective. You will learn what is really happening across Kafka and Kafka Streams, how to interpret the logs and metrics, and how to adjust the configs to achieve your desired outcomes.

Lessons Learned Scaling Stateful Kafka Streams Topologies

  • Ferran Galí i Reniu, LIFULL Connect

I will explain a bunch of actions we have done that helped us scale our topologies to process hundreds of millions of listings: Use kubernetes StatefulSets, tune RocksDB configurations, use Horizontal Pod Scaling wisely, activate consumer Rack Awareness, and more.

Migrating Your System to the Different Kafka Platforms Available

  • Bhavesh Sooka, Synthesis Software Technologies
  • Jonathan Lew, Synthesis Software Technologies

In this talk we will go over our experience in migrating all these technologies and the pitfalls and powerups that we encountered along the way.

MirrorMaker: Beyond the Basics

  • Mickael Maison, Red Hat

At the end of the session you will understand the capabilities of MirrorMaker and the process of building powerful mirroring scenarios with this tool.

Observability of Streaming Applications

  • Kosta Chuturkov, ING
  • Tim van Baarsen, ING Bank

During a step-by-step demo, we will look into different real-life examples and scenarios to demonstrate how to bring the observability of your Kafka applications to the next level.

Playing with Xbox Data

  • Dale Lane, IBM

A fun introduction to the world of Kafka Connect and Kafka Streams by using it to process data from Xbox Live. Gaming is a social activity, so Xbox includes a social aspect. Details about what games you play and when you’re playing are shared with your friends on the Xbox Live service.

Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environments

  • Anna McDonald, Confluent

Armed with these pragmatic best practices, you will be able to successfully bring eventing into your stack and avoid turning your brownfield…into a minefield.

Pushing Apache Kafka to the Limit

  • Amanda Gilbert, Confluent
  • Anna McDonald, Confluent
  • Daan Gerits,
  • Kai Waehner , Confluent
  • Kamlesh Shah, Morgan Stanley

We can all agree that Apache Kafka is an incredibly useful and powerful technology that’s found its way into the heart of a number of companies spanning numerous industries. To the untrained eye, there’s little that Kafka can’t do. But as with any technology, there are limitations.

Real-time Event Processing with Python

  • Dave Klein, Tabular

In this session, we will look at several techniques that we can use to build real-time applications with just Apache Kafka and the confluent-kafka Python package. You may be surprised at how far we can go with these simple tools, but we will also discuss the challenges you might face.

Real-time Fraudulent Trips Detection

  • Xueyao Jiang, FREENOW

In this talk, we are going to focus on how FREENOW builds an event stream processing pipeline with Kafka Streams and Kafka Connect on Kubernetes to detect GPS locations based fraudulent trips in real-time.

Reducing Impact of Single Broker Failures in Kafka

  • Michelle Valentinova, New Relic

We go in depth in the different scenarios that allow this to happen, the configuration which we had chosen in hopes of the best which made these outages possible or worse, and what we did to reduce the impact and still keep Kafka configured as desired.

Reliable Message Reprocessing Patterns for Kafka

  • Dunith Dhanushka, Redpanda

This talk discusses several error-handling patterns you can implement in Kafka consumer applications. We will explore different approaches to handling transient and non-transient errors and highlight the use of dead letter topics in Kafka for message reprocessing.

Restoring Restoration's Reputation in Kafka Streams

  • Bruno Cadonna, Confluent
  • Lucas Brutschy, Confluent

In this talk, we will explain how Kafka Streams currently restores local state and processes records. We will show how we decouple processing from restoring by moving restoration to a dedicated thread and how throughput profits from this decoupling.

Running Kafka as a Native Binary Using GraalVM

  • Ozan Günalp, Red Hat

In this session, we will talk about kafka-native, which leverages GraalVM native image for compiling Kafka broker to native executable using Quarkus framework. After going through some implementation details, we will focus on how it can be used in a Docker container with Testcontainers.

Shipping 1000+ Streaming Data Pipelines To Production

  • Hakan Lofcali, DataCater
  • Stefan Sprenger, DataCater

We walk you through our journey of adopting Apache Kafka®, Kafka Connect, and Kafka Streams. We discuss the challenges that we faced and how we overcame them. Over the course of the talk, we provide answers to important questions.

Storage Capacity Management on Multi-tenant Kafka Cluster

  • Nurettin Omeroglu,

In this end to end story, I will be presenting what the issues were at the beginning, how we came up with a plan, designed, implemented, and applied to our existing clusters smoothly, now how the clients can monitor and even get alerted before their reserved capacity has been reached.

Streaming Infrastructure at Wise

  • Levani Kokhreidze, Wise

In this talk, we will discuss the technical details behind Wise's stream processing platform, such as security, how we run Apache Kafka brokers on Kubernetes, Kafka Streams applications deployment model with high availability, and different self-service tools we have developed.

The Dark and Dirty Side of Fixing Uneven Partitions

  • Olena Babenko, Aiven
  • Olena Kutsenko, Aiven

Come to this talk to learn what to do when the data distribution across topic partitions is badly broken and as a result significantly hurt consuming applications performances, increasing lag and slowing data processing.

The Next Generation of the Consumer Rebalance Protocol

  • David Jacot, Confluent

This talk will unveil the next generation of the consumer rebalance protocol for Apache Kafka (KIP-848) that addresses the shortcomings of the current protocol.

The Possibilities and Pitfalls of Writing Your Own State Stores

  • Daan Gerits,

Join me in a journey of ups and downs that starts with a simple requirement (host an API), through implementing a custom state store and finishes off by describing the challenges we encountered getting our APIs deployed. Don’t expect all “roses and sunshine”.

Timely Auto-Scaling of Kafka Streams Pipelines with Remotely Connected APIs

  • Torben Meyer, bakdata GmbH

In this talk, we first give an overview of the caveats when integrating such services in Kafka Streams and basic approaches for mitigating those. Second, we present our solution for the timely scaling of complex Kafka Streams pipelines in conjunction with remotely connected APIs.

Unveiling the Inner Workings of Apache Kafka® with Flamegraphs

  • Christo Lolov, Amazon
  • Divij Vaidya, Amazon

In this talk, we will explore the use of flamegraphs as a tool for understanding the internals of Apache Kafka and for identifying performance issues. Flamegraphs are a visualization technique that allows you to see the relative usage of CPU and memory by different functions in a program.

Upleveling Analytics with Kafka

  • Amy Chen, dbt Labs

In this session, I will cover my personal journey of how I went from 8000 lines of Excel to learning about Kafka and incorporating it into my analytics pipelines. I’ll explore what topics (pun-intended) an analyst should know about Kafka to build an end-to-end analytics pipeline.

Using Machine Learning to Govern Kafka Clients

  • Shu Wang, Fidelity Investments
  • UmaMahesh Sistu, Fidelity Investments

You will learn ideas for client governance and linting of Kafka client application as part of this talk. Kafka client governance is essential for the smooth operation of a financial services organization and for maintaining the trust of its customers.

Versioned State Stores in Kafka Streams

  • Victoria Xia, Confluent

This talk will introduce versioned state stores starting from the basics, discuss the stream-table join use case as motivation, operational considerations for users who'd like to use them, briefly touch on implementation in doing so.

Where's My Message?

  • David Navalho , Marionete
  • Ricardo Henriques, Marionete

Join us on a journey where we will share our hard-earned experience, as well as just how to tackle that dread question which just seems to keep popping up: ""where's my message?"" Did it ever reach Kafka? Can Kafka really lose a message? Is Kafka down?

You Put *What* in Your Stream?! Patterns and Practices for Event Design

  • Adam Bellemare, Confluent

This talk will explore the real-time analytics technology space from the perspective of the software developer that wants real-time insights in their software. We’ll cover the main categories, how these technologies work and their strengths and weaknesses.

You've Got Mail!

  • Michael van der Haven, CGI

If you’re interested in a little bit of hardcore tech and how event driven architecture works in massive scale in a highly secure GDPR compliant environment, then this talk is for you!

Lightning Talks

Apache Flink on Kafka: Reliable Data Pipelines Everyone Can Code

  • Ela Demir, Vodafone

In this session, you can find out how to build crazy fast stream data pipelines using Apache Flink® over Kafka. Apache Flink® is a distributed stream processing engine that can be used with Kafka's ability to handle high volume, high throughput, and low latency data streams.

Balance Kafka Cluster with Zero Data Movement

  • Haochen Li, Apple
  • Yaodong Yang, Apple

Load balancing is a key factor in achieving high performance and cost efficiency for Kafka clusters. It helps on saving over-provisioned resources caused by skewed brokers, either CPU, memory, or disk storage.

Bench, a Framework for Benchmarking Kafka Using K8s and OpenMessaging Benchmark

  • Sky Kistler, Reddit

In this lightning talk session, we will discuss a messaging benchmark tool developed at Reddit called Bench. Bench quantifies the cost-performance trade-offs of various configurations of messaging systems.

Deep Dive into Kafka Connect Protocol

  • Catalin Pop, Confluent

In this presentation, we will answer the above questions and cover the following topics: • Breaking down how a Connector/task is created • Breaking down each Kafka Connect protocol(eager, compatible, sessioned) • Walk through rebalances for each protocol • Pros/cons of each protocol

Drift Detection with a Low Memory Footprint for ML Models on Kafka Streams

  • Alessandro Conflitti, Radicalbit

The main advantage of our solution lies in its very low memory footprint: while such a feature is important for any computing solution, it is especially valuable in situations where hundreds of messages per second are received, like ours.

Eliminating the Double Write Problem in Apache Kafka Using the Outbox Pattern

  • Rafael Roman, N26

In this talk, we will discuss the double write problem in Apache Kafka and how the outbox pattern can be implemented to solve it. We will also demonstrate the use of the outbox pattern in a sample Kafka application and show how it can be used to ensure data consistency and integrity.

Intelligent, Automatic Restarts for Unhealthy Kafka Consumers on Kubernetes

  • Chris Shepherd, Cloudflare

At Cloudflare we are big Kafka adopters and we run Kafka at a massive scale. We deploy our microservices leveraging Kafka on Kubernetes and we have have some interesting experience on how to keep the latter operational to avoid downtime. - How to Make Your Data Scientists Love Real-time

  • Ralph M. Debusmann,

In this session, we show you how can be used to bring the two disparate worlds of files and streaming together - and thus not only save a lot of time and money hiring real-time and streaming experts, but also make your data scientists start loving real-time.

Manage Consistent Configurations Across Multiple Kafka Environments

  • Nagashree B, Fidelity Investments
  • S Vinod Kumar, Fidelity Investments

In this talk, we will discuss how Fidelity Investments modeled a unique API to seamlessly lift and shift application topics, ACLs, quotas, and every other entity from lower environment to higher environment clusters.

Safeguarding - Protecting Your Kafka from Misbehaving Clients

  • Tom Scott

Join us for a detective hunt as we discuss tools and metrics used to detect misconfigurations in clients, how to address them once discovered, and ways in which to ensure that new occurrences are prevented from arising in the future.

Taming Kafka Connect with kcctl

  • Gunnar Morling, Decodable

Join us for this live demo to see kcctl in action, also touching on some advanced tricks like templating and setting up multiple connectors at once using jsonnet. You'll learn how kcctl sparks joy and boosts your productivity when interacting with Kafka Connect from your shell.

Testing SMTs? Testcontainers to the Rescue!

  • Fábio Sequeira, Marionete
  • Mafalda Santos, Marionete

Join us for a lightning session where we'll show you how to do proper integration testing for custom SMTs, using Testcontainers as the best, most accurate, and solid way to test these complex Kafka Connect components.

The Cost of Kafka’s High Availability on Cloud

  • Geetha Anne, Confluent

In this session let us discuss ways and means to demystify and explore methods to potentially reduce your cloud spend on KAFKA clusters

What to do if Your Kafka Streams App Gets OOMKilled?

  • Andrey Serebryanskiy, Raiffeisen Bank

Have you ever had your stateful Kafka Streams app killed by Kubernetes with the termination reason ""OOMKilled""? Even if you did set up JVM heap limit, the pod still got killed? This is likely due to your RocksDB off-heap memory usage. This talk will explore ways of diagnosing the problem.

Why You Need an Event Stream Registry

  • Robert Manteghi, Babylon Health

During this presentation, we will delve into the ways in which Event Stream Registry, a dataset that outlines the intended state of an event-driven system, can tackle these common challenges.