How to Build a Data Mesh with Stream Governance | Join Webinar
View sessions and slides from Kafka Summit London 2023.
Apache Kafka® co-creator Jay Kreps and guest speakers, highlight Kafka’s past, present and future, including upcoming improvements to make open source Kafka simpler to use, easier to manage, and more reliable.
In this session, we’ll take a look at Kafka performance from an infrastructure perspective. How does your choice of storage, compute, and networking affect cluster throughput? How can you optimize for low cost or fast recovery? When is it better to scale up rather than to scale out brokers?
We'll start off by understanding the importance of events and why we'd even want to build systems with them. Taking the concept of a key-value pair to model events, we’ll explore topics, partitioning, and replication, and look at how to use the Producer and Consumer APIs.
By the end of this session, you’ll know the ins and outs of the read and write requests that your Kafka clients make, making your next debugging or performance analysis session a breeze.
This talk will cover the following topics:
Join this session if you want to learn how to use Cruise Control to automate Kafka cluster management and make your team’s life easier.
Join us as we take a look back at the last year in Kafka with members of our Apache Kafka committee. We will review some of the most influential KIPs and talk about the upcoming changes to expect in the project.
In this talk, attendees will learn the information needed to match their event streaming requirements and objectives with the correct streaming framework. You'll leave with the knowledge of both Kafka Streams and Flink's strengths and weaknesses.
In this session, we explore how Testcontainers libraries allow you programmatically create, manage the lifecycle, and configure ephemeral instances of Kafka. From spinning up individual Kafka services to creating complex cluster topologies, your tests control the environment they require and run.
In this talk, learn how to decouple the communication between disparate microservices using Apache Kafka and manage the state of the events separately using Apache Flink Stateful functions.
The talk will be about how we have successfully used Kafka as the customer facing interface for our Push API, achieving the Best Automotive API 2022 (API:World) and our lessons learned after 2 years in production.
Using telemetry data collected from a fitness app, we’ll demonstrate how we used a combination of Apache Kafka and Python-based microservices running on Kubernetes to build a pipeline for processing and analyzing this data in real-time.
In the session, we will cover ways to:
During this session, I will cover how, at Morgan Stanley, we built a real-time, microservices based Liquidity Management platform using event streaming with Kafka Streams API, to tackle high volumes of data and to perform calculations on cross domain events, spanning wide time windows.
In this talk, we identify dataflow architectural principles to address these demands and discuss their application in an open-source ecosystem. We show how to create a decentralized dataflow engine underpinned by Kafka and the Kafka Streams client library.
In this talk we will describe how we managed to apply Data Mesh founding principles to our operational plane, based on Kafka. Consequently, we have gained value from these principles more broadly than just analytics. An example of this is treating data as-a-product.
Attendees learn in detail how real world events were varied for the experiment, including design goals, hard trade-offs, and safety mechanisms necessary for the load tool to adhere to Chaos Engineering principles. We show how the results were analyzed to support or debunk the hypothesis.
This session is targeted towards developers interested in learning how to integrate gRPC with Kafka event streaming; securely, reliably and scalably.
In this talk, we will dive into technical details to shed some light on the above questions. We approach the topic from a conceptual point of view, explain the challenges Kafka Connect faces when it comes to exactly-once, discuss how external source and sink systems can be integrated.
In this talk, learn how KIP-618 made exactly-once source connectors possible. Topics covered will include an overview of exactly-once support in Kafka’s client libraries, a brief refresher on the source connector API, a deep dive into some of the internal workings of Kafka Connect.
This project was all about replacing a huge and complex Business Process Management tool, an orchestrator of our internal logistic flows. And when we say huge, we really mean it: more than 24 processes, 150 millions of tyres moved representing 10 billions € of Michelin turnover.
In this talk I will introduce a simple consumer implementation with a default configuration and discuss the KIPs and features that have been introduced over time to limit how the hostile world of cloud computing can impact your real-time consuming applications.
I will walk you through how we have achieved tenant isolation in our architecture where we have isolated tenants on an architecture level through Kafka topics and on a software level through threads. We have successfully used this design for years but as with all designs, it has its limitations.
In this talk, we’ll walk you through how we implemented exactly-once delivery with Kafka by managing Kafka transactions the right way, and how we escaped endless rebalance storms when running hundreds of consumers on the same Kafka topic.
In our talk, we discuss different approaches and highlight an indexing strategy for guaranteeing the order of a range query. We will discuss the pros and cons and finally demonstrate a real-world example of our solution.
In this session, we have a look at a real-world IoT project in which hundreds of residential building complexes are equipped with thousands of sensors and actuators that communicate via Kafka to an optimization system to reduce energy consumption and eventually help to protect our planet.
In order to have a scalable solution we are using the standard Kafka connect architecture with a sink as the base. Records that are received by this sink are pseudonymized and in the meantime, a second record is created.
This talk aims to completely demystify the system from an operational perspective. You will learn what is really happening across Kafka and Kafka Streams, how to interpret the logs and metrics, and how to adjust the configs to achieve your desired outcomes.
I will explain a bunch of actions we have done that helped us scale our topologies to process hundreds of millions of listings: Use kubernetes StatefulSets, tune RocksDB configurations, use Horizontal Pod Scaling wisely, activate consumer Rack Awareness, and more.
In this talk we will go over our experience in migrating all these technologies and the pitfalls and powerups that we encountered along the way.
At the end of the session you will understand the capabilities of MirrorMaker and the process of building powerful mirroring scenarios with this tool.
During a step-by-step demo, we will look into different real-life examples and scenarios to demonstrate how to bring the observability of your Kafka applications to the next level.
A fun introduction to the world of Kafka Connect and Kafka Streams by using it to process data from Xbox Live. Gaming is a social activity, so Xbox includes a social aspect. Details about what games you play and when you’re playing are shared with your friends on the Xbox Live service.
Armed with these pragmatic best practices, you will be able to successfully bring eventing into your stack and avoid turning your brownfield…into a minefield.
We can all agree that Apache Kafka is an incredibly useful and powerful technology that’s found its way into the heart of a number of companies spanning numerous industries. To the untrained eye, there’s little that Kafka can’t do. But as with any technology, there are limitations.
In this session, we will look at several techniques that we can use to build real-time applications with just Apache Kafka and the confluent-kafka Python package. You may be surprised at how far we can go with these simple tools, but we will also discuss the challenges you might face.
In this talk, we are going to focus on how FREENOW builds an event stream processing pipeline with Kafka Streams and Kafka Connect on Kubernetes to detect GPS locations based fraudulent trips in real-time.
We go in depth in the different scenarios that allow this to happen, the configuration which we had chosen in hopes of the best which made these outages possible or worse, and what we did to reduce the impact and still keep Kafka configured as desired.
This talk discusses several error-handling patterns you can implement in Kafka consumer applications. We will explore different approaches to handling transient and non-transient errors and highlight the use of dead letter topics in Kafka for message reprocessing.
In this talk, we will explain how Kafka Streams currently restores local state and processes records. We will show how we decouple processing from restoring by moving restoration to a dedicated thread and how throughput profits from this decoupling.
In this session, we will talk about kafka-native, which leverages GraalVM native image for compiling Kafka broker to native executable using Quarkus framework. After going through some implementation details, we will focus on how it can be used in a Docker container with Testcontainers.
We walk you through our journey of adopting Apache Kafka®, Kafka Connect, and Kafka Streams. We discuss the challenges that we faced and how we overcame them. Over the course of the talk, we provide answers to important questions.
In this end to end story, I will be presenting what the issues were at the beginning, how we came up with a plan, designed, implemented, and applied to our existing clusters smoothly, now how the clients can monitor and even get alerted before their reserved capacity has been reached.
In this talk, we will discuss the technical details behind Wise's stream processing platform, such as security, how we run Apache Kafka brokers on Kubernetes, Kafka Streams applications deployment model with high availability, and different self-service tools we have developed.
Come to this talk to learn what to do when the data distribution across topic partitions is badly broken and as a result significantly hurt consuming applications performances, increasing lag and slowing data processing.
This talk will unveil the next generation of the consumer rebalance protocol for Apache Kafka (KIP-848) that addresses the shortcomings of the current protocol.
Join me in a journey of ups and downs that starts with a simple requirement (host an API), through implementing a custom state store and finishes off by describing the challenges we encountered getting our APIs deployed. Don’t expect all “roses and sunshine”.
In this talk, we first give an overview of the caveats when integrating such services in Kafka Streams and basic approaches for mitigating those. Second, we present our solution for the timely scaling of complex Kafka Streams pipelines in conjunction with remotely connected APIs.
In this talk, we will explore the use of flamegraphs as a tool for understanding the internals of Apache Kafka and for identifying performance issues. Flamegraphs are a visualization technique that allows you to see the relative usage of CPU and memory by different functions in a program.
In this session, I will cover my personal journey of how I went from 8000 lines of Excel to learning about Kafka and incorporating it into my analytics pipelines. I’ll explore what topics (pun-intended) an analyst should know about Kafka to build an end-to-end analytics pipeline.
You will learn ideas for client governance and linting of Kafka client application as part of this talk. Kafka client governance is essential for the smooth operation of a financial services organization and for maintaining the trust of its customers.
This talk will introduce versioned state stores starting from the basics, discuss the stream-table join use case as motivation, operational considerations for users who'd like to use them, briefly touch on implementation in doing so.
Join us on a journey where we will share our hard-earned experience, as well as just how to tackle that dread question which just seems to keep popping up: ""where's my message?"" Did it ever reach Kafka? Can Kafka really lose a message? Is Kafka down?
This talk will explore the real-time analytics technology space from the perspective of the software developer that wants real-time insights in their software. We’ll cover the main categories, how these technologies work and their strengths and weaknesses.
If you’re interested in a little bit of hardcore tech and how event driven architecture works in massive scale in a highly secure GDPR compliant environment, then this talk is for you!
In this session, you can find out how to build crazy fast stream data pipelines using Apache Flink® over Kafka. Apache Flink® is a distributed stream processing engine that can be used with Kafka's ability to handle high volume, high throughput, and low latency data streams.
Load balancing is a key factor in achieving high performance and cost efficiency for Kafka clusters. It helps on saving over-provisioned resources caused by skewed brokers, either CPU, memory, or disk storage.
In this lightning talk session, we will discuss a messaging benchmark tool developed at Reddit called Bench. Bench quantifies the cost-performance trade-offs of various configurations of messaging systems.
In this presentation, we will answer the above questions and cover the following topics: • Breaking down how a Connector/task is created • Breaking down each Kafka Connect protocol(eager, compatible, sessioned) • Walk through rebalances for each protocol • Pros/cons of each protocol
The main advantage of our solution lies in its very low memory footprint: while such a feature is important for any computing solution, it is especially valuable in situations where hundreds of messages per second are received, like ours.
In this talk, we will discuss the double write problem in Apache Kafka and how the outbox pattern can be implemented to solve it. We will also demonstrate the use of the outbox pattern in a sample Kafka application and show how it can be used to ensure data consistency and integrity.
At Cloudflare we are big Kafka adopters and we run Kafka at a massive scale. We deploy our microservices leveraging Kafka on Kubernetes and we have have some interesting experience on how to keep the latter operational to avoid downtime.
In this session, we show you how kash.py can be used to bring the two disparate worlds of files and streaming together - and thus not only save a lot of time and money hiring real-time and streaming experts, but also make your data scientists start loving real-time.
In this talk, we will discuss how Fidelity Investments modeled a unique API to seamlessly lift and shift application topics, ACLs, quotas, and every other entity from lower environment to higher environment clusters.
Join us for a detective hunt as we discuss tools and metrics used to detect misconfigurations in clients, how to address them once discovered, and ways in which to ensure that new occurrences are prevented from arising in the future.
Join us for this live demo to see kcctl in action, also touching on some advanced tricks like templating and setting up multiple connectors at once using jsonnet. You'll learn how kcctl sparks joy and boosts your productivity when interacting with Kafka Connect from your shell.
Join us for a lightning session where we'll show you how to do proper integration testing for custom SMTs, using Testcontainers as the best, most accurate, and solid way to test these complex Kafka Connect components.
In this session let us discuss ways and means to demystify and explore methods to potentially reduce your cloud spend on KAFKA clusters
Have you ever had your stateful Kafka Streams app killed by Kubernetes with the termination reason ""OOMKilled""? Even if you did set up JVM heap limit, the pod still got killed? This is likely due to your RocksDB off-heap memory usage. This talk will explore ways of diagnosing the problem.
During this presentation, we will delve into the ways in which Event Stream Registry, a dataset that outlines the intended state of an event-driven system, can tackle these common challenges.