Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more

Current 2023

View the sessions and slides

Keynotes

Kafka, Flink, and Beyond

  • Satish Duggana, Uber
  • Danica Fine, Confluent
  • Ismael Juma, Confluent
  • Anna McDonald, Confluent
  • Tobias Nothaft, BMW Group
  • Hitesh Seth, JPMorganChase
  • Martijn Visser, Confluent

In this keynote, leaders from the Kafka and Flink communities will highlight recent contributions, upcoming project improvements, and the innovative applications that users are building to shape the future of data streaming.

Streaming into the Future: The Evolution and Impact of Data Streaming Platforms

  • Shaun Clowes, Confluent
  • Joseph Foster, NASA
  • Jay Kreps, Confluent
  • Girish Rao, Warner Bros. Discovery (WBD)
  • Daniel Sternberg, Notion

The Keynote will present how a Data Streaming Platform built on Kafka can enable organizations to stitch together data from across the organization to produce high-value data assets that can be shared and used to support operational applications and data analytics, including exciting new use cases.

Breakout Sessions

(Sponsored session) (real-time)²: Real-time data for real-time analytics with Kafka and ClickHouse

  • Ryadh Dahimene, Clickhouse
  • Dale McDiarmid, Clickhouse

In this talk, we'll explore this dynamic synergy between Kafka and ClickHouse with a live demonstration leveraging OpenSky data. We’ll use ClickPipes, the ClickHouse Cloud native Kafka integration, for building an end-to-end real-time data processing and analytics solution.

(Sponsored session) 3 Flink Mistakes We Made So You Won't Have To

  • Sharon Xie, Decodable
  • Robert Metzger, Decodable

We will talk about avoiding data-loss with Flink’s Kafka exactly-once producer, configuring Flink for getting the most bang for the buck out of your memory configuration and tuning for efficient checkpointing.

4 Patterns to Jumpstart your Event-Driven Architecture Journey

  • Hans-Peter Grahsl, Red Hat

During this session, we explore selected event-driven architecture patterns commonly found in the field: the claim-check pattern, the content enricher pattern, the message translator pattern, and the outbox pattern.

5-minute Practical Streaming Techniques that can Save You Millions

  • Zhenzhong Xu , Claypot AI

In this talk, I will share some simple optimization techniques you can apply with streaming SQL in just a few minutes that can cut costs by 10x or even 100x. Then, we’ll gradually dive deeper into some novel optimization techniques that can be applied across your distributed storage.

10 tips for enabling data discovery and governance in your organization

  • Sherin Thomas, Chime

In this presentation, I will talk about how we solved the problem of cataloging and discovery using Datahub as our discovery platform. I will cover the details of how we went about ingesting metadata from a plethora of infrastructure and platform components.

A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architecture with Apache Hudi

  • Nadine Farah , Onehouse
  • Ethan Guo, Onehouse

In this talk, attendees will walk away with the current challenges of building a medallion architecture at low-latency how the record index and incremental updates work with Apache Hudi, how the new Hudi CDC feature unlocks incremental processing on the lake.

A Practical Guide To End-to-End Tracing In Event Driven Architectures

  • Roman Kolesnev, Confluent

In this session, you will gain an understanding of the importance of end-to-end traceability, and several tools & examples for improving observability in your own distributed event driven applications.

(Sponsored session) A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid

  • Gian Merlino, Imply
  • Kai Waehner, Confluent

Join this session to learn about Apache Druid and why companies use it in combination with Kafka and Flink for real-time applications. Learn how Apache Druid complements Flink and Kafka - and what makes it purpose-built for analyzing streams and events.

(Sponsored session) Accelerate Trusted AI by Mastering Data-in-Motion

  • Ram Venkatesh, Cloudera

Cloudera integrates NiFi, Kafka, and Flink into a single platform bringing unparalleled speed and flexibility to your AI pipelines.

(Sponsored session) Accelerating Path to Production for Generative AI-powered Applications

  • Prakul Agarwal, MongoDB
  • David Macias, MongoDB

In this session, we will discuss some recent developments in Generative AI and how those can be leveraged to build intelligent applications. Learn how to bring the power of large language models (LLMs) to your private, real-time operational data across multiple data types.

Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pinot Serves it to Them)

  • Tim Berglund , StarTree

Come to this talk to understand the forces that have given rise to this class of database, learn about Pinot's internals, and see some examples of it in action.

Anomaly Detection on Time Series Data Using Apache Flink

  • Ali Zeybek, Ververica

In this talk, we will walk through the steps to implement a real time anomaly detection system on time series data using Apache Flink. We will implement and compare several algorithms, Exponentially Weighted Moving Average (EWMA) and Probabilistic EWMA (PEWMA), from an academic paper.

Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable Consumer Groups

  • David Jacot, Confluent

In this talk, we will dive into all aspects of the new protocol, look into the architecture of Apache Kafka's brand new group coordinator, discuss the upgrade path for both the consumers and the brokers, and finally update the community about where we stand in the development.

(Sponsored session) Architecting Scalable IoT Systems with MQTT and Kafka

  • Christian Meinerding, HiveMQ

Join to learn how MQTT and Kafka combine to handle millions of data points with ease, allowing Rimac to deliver a scalable and seamless customer experience. Attendees will glean insights into the architectural strategies that manage vast device networks and high-velocity data.

Autoscaling Confluent Cloud: Should We? How Would We?

  • Amanda Gilbert, Confluent

In this session, we’ll explore the topic of auto-scaling by implementing a strategy for Confluent Cloud resources. We’ll first discuss common use cases that dictate a need to create a scaling strategy for Confluent Cloud and introduce the approaches best suited for each use case.

(Sponsored session) AWS S3 Connector to Backup/Restore

  • Adamos Loizou, Lenses.io

In this talk we will show you the latest feature-packed Lenses open-source S3 Source & Sink connectors, patterns & best practices for how to backup/restore to S3, how to have a seamless, one-click experience to backup and restore in Lenses 5.3.

Beyond a Lifetime of Data

  • Vanessa Burckard, Social Security Administration

By shifting our processes to an Event Driven Architecture, one business line at a time, we hope to set the stage to revisit our critical data query models, while highlighting and correcting the data quality issues that inevitably built up over decades.

Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Architecture

  • Andrew Kolb, Thrivent Financial

In this talk, we will showcase strategies, patterns, and techniques that were developed during an effort to abstract (and strangle) 20+ internal systems into a unified data integration platform.

(Sponsored session) Boosting Kafka Performance in a Day

  • Jiří Holuša, Azul Systems

After the session, you'll understand the importance of the underlaying JVM and how you can leverage this knowledge to boost the performance of the cluster to achieve better SLAs or reduce the infrastructure costs.

Build Real-time Machine Learning Apps on Generative AI with Kafka Streams

  • Steffen Hoellinger, Airy

In this talk we show how data and machine learning teams can rapidly prototype and deploy real-time ML apps, ingesting real-time data with the help of Apache Kafka® and Airy, an open-source app framework. We will discuss different options to finetune LLMs and „chaining“ them with other ML models.

(Sponsored session) Build Streaming Data Applications in Minutes Using SwimOS and Nstream

  • Jeremy Custenborder, Nstream

Jeremy will showcase a live demo, illustrating the tangible advantages: fewer managed data systems, reduced latency, simpler coding, and interactive UIs. Whether a Kafka novice or a seasoned user, walk away from this session with actionable takeaways and tools for your business.

Building a Dynamic Rules Engine with Kafka Streams

  • Will LaForest, Confluent
  • Michael Peacock, Confluent

In this talk we will cover the architecture of our Kafka Streams layer that makes it possible to use external data feeds as rule input, how we handle dynamic criteria for joins and filters, best practices for writing dynamic rule engines in Kafka Streams and upcoming improvements to Kafka Streams.

Building a Winning Data Engineering Culture

  • Xinran Waibel, Netflix

In this session, we will talk about the roles of data engineers, why data engineering is critical to the success of data organizations, and how to build a winning data engineering culture that empowers both data engineers and partners.

Business Event Driven Architecture & Governance in Action

  • Wim Debreuck, Cymo.eu

In this talk, We’ll be sharing several experiences in setting up a COE for large industrial companies, insurance and logistic environments. From setting up a strong foundation, defining event designs, best practices, and principles to the guidance of development teams.

Configuring Kafka Connect To Be Successful At Scale

  • Travis Sweet, Confluent

This presentation will look at the best practices to configure Kafka Connect to output important data when evaluating if more resources are needed for a Kafka Connect worker or if a new node should be added to the cluster overall.

Datalake Rock Paper Scissors: Iceberg + Flink or Iceberg + Spark?

  • Sitarama Chekuri, Bloomberg
  • Ben de Vera, Bloomberg

In this session, we'll share our experiences and lessons learned working with both technologies to ingest data from Kafka into our Iceberg datalake at near-real-time speeds.

Debezium Snapshots Revisited!

  • Gunnar Morling, Decodable

In this session you'll learn how this innovative scheme of interleaving snapshot queries and log-based change events works under the hood and how it solves common tasks when running CDC pipelines. We'll also discuss advanced topics like parallelising snapshots and customising snapshot contents.

Deeply Declarative Data Pipelines

  • Ryanne Dolan , LinkedIn

In this talk, we'll explore just how ""declarative"" we can make streaming data pipelines on Kubernetes. I'll show how we can go deeper by adding more and more operators to the stack. How deep can we go?

Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degraded Storage in Kafka

  • Rittika Adhikari, Confluent

In this talk, we will discuss how we have tackled this problem head-on with a fully automated degraded storage detection and remediation system. We’ll highlight the importance of monitoring storage performance and take a deep-dive into how we formulated the detection algorithm.

Dynamic Change Data Capture with Flink CDC and Consistent Hashing

  • Xiao Meng, Goldsky
  • Yaroslav Tkachenko, Goldsky

At Goldsky, we needed a way to configure CDC for a large Postgres database dynamically: the list of tables to ingest is driven by customer-facing features and is constantly changing.

Evolution of Real-time User Engagement Event Consumption at Pinterest

  • Heng Zhang, Pinterest
  • Lu Liu, Pinterest

We will discuss how we at Pinterest transformed real time user engagement event consumption.

Evolution of Streaming Pipeline at Lyft

  • Rakesh Kumar, Lyft

Rakesh discusses how Lyft organically evolved and scaled the streaming platform that provides a consistent view of the marketplace to aid an individual team independently run their optimization.

Exactly-Once Semantics Revisited: Distributed Transactions across Flink and Kafka

  • Alexander Sorokoumov, Confluent
  • Tzu-Li (Gordon) Tai, Confluent

In this session, you’ll see how the Flink and Kafka communities are uniting to tackle these long-standing technical debts. We’ll introduce the basics of how Flink achieves EOS with external systems and explore the common hurdles that are encountered when implementing distributed transactions.

Fast Fourier Transform (FFT) of Time Series in Kafka Streams

  • Igor Khalitov, Confluent

In this session, we’ll explore transforming signals from the time domain to the frequency domain using FFT, maximizing the level of compression of input signals while building a precise frequency alert system.

Flink SQL: The Challenges to Build a Streaming SQL Engine

  • Jingsong Li, Alibaba

In this session, we will explore the challenges that arise when building a modern streaming SQL engine like Flink SQL.

Forecasting Kafka Lag Issues with Machine Learning

  • Kumaran Ponnambalam, Cisco Systems Inc

In this session, we will discuss our work in this regard using machine learning. We will discuss popular lag patterns and how our ensemble forecasting system learns from the past and predicts future trends. We will also showcase some case studies and benefits of having such a system.

From 0 to 300 mph Towards the Promised Land

  • Michael Rosam, Quix
  • Tun Shwe, Quix

In this session, Mike Rosam and Tun Shwe will share their experiences of building data teams at McLaren and fast growth startups. They will take you on a journey of how they navigated their way from the old batch world to the new streaming world.

From 🐛 to 🦋: Data Pipelines Evolution from Batch to Streaming

  • Francesco Tisiot, Aiven

This session explores how Apache Flink can narrow the gap between batch and streaming by keeping the same data pipelines definition while the underlying technology evolves.

(Sponsored session) From Edge to Cloud, Creating Data Pipelines Using Open-source with Strimzi on Kubernetes

  • Carles Arnal, Red Hat
  • Chris Cranford, Red Hat

In this session, we’ll walk through a real-world example of capturing changes made in a relational database with a connector configured to use Apicurio Registry, publishing those changes to Kafka serialized in Avro’s compact binary form, and utilizing Kafka Streams, Quarkus, and Camel-K.

(Sponsored session) From the Battlefield: Squeezing the Most From Your Kafka Infrastructure

  • Stuart Mould, Conduktor

In this talk, we’ll explore the problems you’ll experience from your Kafka infrastructure expanding and many clever solutions to mitigate them.

From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python

  • Vinodhini Duraisamy, Snowflake

In this talk, you will learn to build a Streamlit data application to help visualize the ROI of different advertising spends of an example organization.

General Coordinates Network: Harnessing Kafka for Real-Time Open Astronomy at NASA

  • Judith Racusin, NASA Goddard Space Flight Center

In this talk, we will discuss architectural choices, challenges, and lessons learned in adapting Kafka for open science and open data. Our novel approach to OpenID Connect / OAuth2 in Kafka is designed to securely scale Kafka from access inside a single organization to access by the general public.

(Sponsored session) Get More from your Data: Accelerate Time-to-Value and Reduce TCO with Confluent Cloud on AWS

  • Joseph Morais , Confluent
  • Weifan Liang, Amazon Web Services

In this talk, we will explore how Confluent and Amazon Web Services (AWS) work together to help you in the journey of data modernization and innovation.

Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecosystem

  • Martijn Visser, Confluent

By the end of this talk, attendees will have a solid understanding of Flink connectors, the connector interface, and be better equipped to build efficient and reliable data processing pipelines with Flink.

Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune Up Performance

  • Bosmat Tuvel, Speedb

This presentation will discuss the importance of optimizing and choosing storage engines for Kafka streams applications.

Go Big or Go Home: Approaching Kafka Replication at Scale

  • Julia Holgado, New Relic

In this talk we'll discuss what bottlenecks we have hit as we scaled out, and what measures we took to remove them, such as replicating data based on Kafka Headers, connecting to many source and destination Kafka clusters, managing the replication of Kafka topics of varying traffic.

Handle Millions of IoT Devices Connected to Kafka via MQTT

  • John Fallows, Aklivity

This session is targeted towards developers interested in learning how to use Kafka as the data plane for their MQTT broker infrastructure, without needing to run separate MQTT brokers.

(Sponsored session) How Curing “Kafkatosis” Can Improve Your Stream Management and Governance

  • Jonathan Schabowsky, Solace

In this talk, Schabowsky will introduce the causes and symptoms of Kafkatosis, such as a lack of stream reuse, inefficient application onboarding, and operational disruptions. He will help you understand the nature and impact of these problems through real-world examples.

How We Built Nucleus: Community Brands' Analytics Platform

  • Curt Buechter, Community Brands

In this talk, we describe how the Nucleus engineering team built a real time, user-facing analytics app.

Indeed Flex: The Story of a Revolutionary Recruitment Platform

  • Gayathri Veale, Indeed
  • Ronak Patel, Indeed

If you’re in discussions surrounding event driven systems at your organization then this talk is for you. Join Ronak and me for this talk and let’s have a discussion.

(Sponsored session) Introducing Oxia: A Scalable Zookeeper Alternative

  • David Kjerrumgaard, StreamNative

Event streaming platforms like Kafka have traditionally leaned on ZooKeeper as the cornerstone for coordination and metadata management. This presentation introduces Oxia, a compelling alternative solution.

Isolating Streaming Ingest and Queries Using RocksDB

  • Nathan Bronson, Rockset

In this talk, we will present a real-time analytics architecture we implemented in the Rockset database, based on RocksDB, that effectively isolates streaming data ingestion from query serving.

Learnings of Running Kafka Tiered Storage at Scale

  • Satish Duggana, Uber
  • Abhijeet Kumar, Uber

We will talk about the the principles followed in building the feature, the journey of deploying and running it in our production clusters with different workloads, the learnings from running it in production at a large scale, that led to a few interesting features extended from KIP-405.

Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Comprehensive Guide

  • Zoe Steinkamp, InfluxData

This talk will go into connecting Apache Kafka and InfluxDB and the why, how, and what you can accomplish by doing so.

Mitigating Kafka Broker ‘Gray’ Failures For Key Based Partitioners With Partition Multihoming

  • Christopher Wildman, New Relic

In this talk I describe partition multihoming (PMH), a form of virtual partitioning where two or more physical Kafka partitions are guaranteed to be consumed by the same consumer instance.

Need for Speed: Machine Learning in the Era of Real-Time

  • Oli Makhasoeva, Bytewax

By the end of our talk, attendees will leave with an understanding of the latest RTML techniques and the essential factors to consider when designing and implementing real-time machine-learning solutions.

Off-Label Data Mesh: A Prescription for Healthier Data

  • Adam Bellemare, Confluent

If you've ever provided a customer with an analytical report that differed from their operational conclusions, then this talk is for you.

OpenMessaging Benchmark: Measuring the Performance of Streaming Systems

  • Matteo Merli, StreamNative

This talk will present the OpenMessaging Benchmark, why it was created, and how one can use it to model messaging workloads and verify the behavior of different systems.

(Sponsored session) Operationalizing Pluralsight's Data with Materialize

  • Arjun Narayan, Materialize

In this talk, we'll present how Pluralsight is replacing streaming systems with an operational data warehouse for their real-time use cases. Today, Materialize powers Pluralsights Plan Analytics and core data models for their content offerings.

(Sponsored session) Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited Storage from AWS Services

  • Todd McGrath, Amazon Web Services
  • Vidhi Taneja, Amazon Web Services

In this session you will learn how AWS streaming data services can power your streaming applications that can scale to virtually unlimited storage with Tiered Storage. Discover how AWS services collaborate to address diverse EDAs, CDC applications and real-time analytics use cases.

Our Multi-Year Journey to a 10x Faster Confluent Cloud

  • Marc Selwan, Confluent
  • Shriram Sridharan, Confluent

By attending this talk, attendees will be able to take our learnings from making Confluent Cloud latencies 10x better and possibly apply similar principles to their cloud native data streaming systems.

(Sponsored session) Put Events to Work and Respond in Real Time

  • Alan Chatt, IBM

Join me to learn how IBM Event Automation, a composable solution, puts your events to work by enabling both business and IT users to detect scenarios, act in real time, and automate decisions.

Query Your Streaming Data on Kafka using SQL: Why, How, and What

  • Gang Tao, Timeplus
  • Jove Zhong, Timeplus

In this talk, we will introduce the audience to the world of querying streaming data on Apache Kafka with SQL, compare and contrast the features and capabilities of each of these tools, and provide an in-depth analysis of their respective Pros and Cons.

Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability

  • Vanessa Vuibert, Shopify

In this talk, we'll run through our Kafka infrastructure at Shopify and how clients connect to it. Next, we'll describe our solution for performing failovers using DNS. Afterwards, we'll look at some real world scenarios where this system saved us from major outages.

Restate: Stream Processing, but for Microservices

  • Stephan Ewen , Restate
  • Giselle van Dongen, Restate

In this session, we share ideas from a novel system we are developing, called 'Restate'. Our work is inspired by event-sourcing and stream processing systems, but rethought from the ground up for microservices.

Robinhood’s Kafkaproxy: Decoupling Kafka Consumer Logic from Application Business Logic

  • Tony Chen, Robinhood
  • Mun Yong Jang, Robinhood

If your organization is looking to centralize Kafka consumption logic to a singular client library (instead of multiple different client libraries), please attend this talk to see how Robinhood does it so that the infrastructure team can focus development on a singular library.

Rule Based Asset Management Workflow Automation at Netflix

  • Burak Bacioglu, Netflix
  • Meenakshi Jindal, Netflix

We implemented a workflow rule engine that allows users to define rules and conditions to specify the applicable workflows for assets, based on their types, metadata and states.

Save Money by Uncovering Kafka’s Hidden Cloud Costs

  • Addison Huddy, Confluent

In this session, we will dive into Kafka’s network, storage, compute costs, and more to show you how to calculate and anticipate the bill for your Kafka deployment. And to take it a step further, we’ll explore what Confluent has done to reduce Kafka cloud costs for ourselves and our customers.

Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond

  • Mahendra Kumar, BigCommerce
  • Aristatle Subramaniam, BigCommerce

In this talk, we'll share how we tackled the challenge of building a fully managed robust data pipeline using a combination of streaming analytics, batch processing, data lake, and machine learning.

Securing Your Streaming Data with Role Based Access Control: What, Why and How

  • Hojjat Jafarpour, DeltaStream
  • Krishna Raman, DeltaStream

In this talk, we first present the state of the art in Role-Based Access Control for streaming data in Apache Kafka. We then present a novel approach where we bring the same RBAC concepts from relational systems to the data in motion space and compare it with the current solutions.

Side Effects Are Why We Can’t Have Nice Things

  • Kris Jenkins

Join me to understand what side effects are and why they matter. You’ll learn to spot them in your own code. You’ll see how they sneak into our tests, our APIs and our systems’ designs and make everything harder. Then we’ll look at our industry’s many solutions to side effects.

Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lakehouse

  • Frank Munz, Databricks

This session kicks off with a technical, no-nonsense introduction to the lakehouse concept, dives deep into the lakehouse architecture and recaps how a data lakehouse is built from the ground up with streaming as a first-class citizen.

(Sponsored session) Stream Processing Solution for the Enterprise

  • Jun Qin, Ververica

We will cover topics such as Flink application lifecycle management, Flink SQL development, multi-tenancy, security, cost optimization, business continuity, customer support, deployment options, and how they are supported in our product Ververica Platform.

Streaming is a Detail

  • Amy Chen, dbt Labs
  • Florian Eiden, dbt Labs

In this session, we’ll discuss how we see streaming at dbt Labs. We will dive into how we are extending dbt to support low-latency scenarios and the recent additions we have made to make batch and streaming allies in a DAG rather than archenemies.

Streaming Solutions Showdown

  • Sophie Blee-Goldman, Responsive

By the end of the panel, you’ll be able to make a more informed decision choosing a streaming technology for your next project!

(Sponsored session) Striim: Unifying Change Data Capture, AI/ML, and Exactly-once Processing in a Managed Scalable, Streaming Platform

  • Varun Verma, Striim

Learn how Striim architected and manages a unified data streaming platform optimized for fast deployment of event stream delivery and processing pipelines that deliver real-time analytics and AI for business use cases.

Sustainability & Streaming Data: Merging Real-Time Insights, Green Futures & Profitability

  • Chris Sachs , Nstream

Chris will explore how organizations can leverage real-time insights to reduce waste, conserve resources, lower carbon footprints, and reduce operational expenses. He will also discuss the ethical considerations around collecting and using environmental data.

The Interactive Kafka Light Show

  • Barry Tarlton, Nationwide

Come join our interactive session as we trip the light fantastic in this colorful eye-opening journey into the event streaming dream.

The Nuts and Bolts of Kafka Streams---An Architectural Deep Dive

  • Matthias J. Sax, Confluent

In this talk, we will explore the internal architecture of Kafka Streams to set you up for successfully running and tuning your applications. -- What does the internal threading model look like? How are partitions assigned and mapped to tasks? Why are there multiple internal consumers?

The Wonderful World of Apache Kafka

  • Dave Klein, Tabular

In this session, we'll have a gentle introduction to Apache Kafka, and then a survey of some of the more popular components in the Kafka ecosystem. We'll look at the Kafka Producer and Consumer libraries, Kafka Connect, Kafka Streams, the Confluent Schema Registry, and more.

Time-State Analytics

  • Henry Milner, Conviva
  • Vyas Sekar, Conviva

In this talk, we will share our experiences to explain why state-of-art systems offer poor abstractions to tackle such workloads and why they suffer from poor cost-performance tradeoffs and significant complexity.

Transactions in Action: the Story of Exactly Once in Apache Kafka

  • Justine Olshan, Confluent

This talk will give a refresher on transactions and idempotency and chronicle the various KIPs that improved the protocol over the years. We will also discuss the problem of hanging transactions and how KIP-890 hopes to solve it as well as strengthen the transactional protocol overall.

(Sponsored session) Unifying Stream Processing with a Fast Data Store

  • Fawaz Ghali, Hazelcast
  • Michael Goldverg, BNY Mellon

In this session, we will describe an architecture that addresses simplicity and performance in stream processing deployments, while also reducing cost. This architecture aims for fewer moving parts, fewer clusters and servers to manage, fewer network hops, higher throughput, and lower latency.

Unleashing your Kafka Streams Application Metrics!

  • Neil Buesing, Kinetic Edge

Let's uncover what you should be monitoring, why you should be monitoring it, and leave you with properly monitored Kafka Streams applications.

(Sponsored session) Unlocking Real-Time Data Insights: Leveraging Confluent Cloud on Azure for Streamlined and Scalable Data Pipelines

  • Jacob Bozorov, Microsoft
  • Vlad Kozlov, Microsoft
  • Ram Dhakne, Confluent

Traditional data pipelines face scalability and cost challenges due to monolithic design and batch processing. They assume all data must be stored in one location, leading to time-consuming, expensive, and error-prone processes.

Unlocking the Power of Apache Flink: An Introduction in 4 Acts

  • David Anderson, Confluent

During this talk, we'll bring these principles to life with real-world examples and demos.

Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evernorth Health Services

  • Nilay Sundarkar, Evernorth Health Services

In this talk, we look at textbook examples of using kafka at scale. Specifically focused on Evernorth Health Service's journey of implementing microservices data pipelines, we provide an overview of the patterns we used while implementing CDC data pipelines for these microservices using kafka.

What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2

  • Bill Bejeck, Confluent

In this talk, I will cover building an Interactive Query service including routing queries between app instances, creating custom queries using Interactive Query v2, testing your IQ Application.

When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in Venice

  • Zachary Policzer, LinkedIn

This talk presents Venice and deep dives into how we designed it to enable high data ingestion volumes via Kafka, merging it all coherently from many data sources and many geographically distributed regions. We’ll cover how Venice’s conflict resolution strategy can be a powerful abstraction.

(Sponsored Session) Why BYOC is the Future of Cloud Services

  • Christina Lin, Red Hat

In this talk we’ll cover an architectural overview of BYOC, behind the scenes of BYOC and deployment strategies and a demo of Redpanda BYOC.

Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable

  • Jean-Sébastien Brunner, Confluent
  • Mayank Juneja, Confluent

Attendees will learn the benefits of serverless and see how it fits into the context of stream processing. We’ll then kick off a demo where we’ll focus on a real world production use case that uses Flink jobs to power an application with extremely low latency.

Workflow Engines & Event Streaming Brokers - Can They Work Together?

  • Natan Silnitsky, Wix

In this talk we will learn about the tradeoffs between the two technologies and how to implement various use cases in each architecture, including those that need a little more work.

Current 2023 Closing Session

  • Danica Fine, Confluent

We’ll celebrate the speakers (and attendees) who helped make this conference possible, highlight some of the best sessions from the past two days, and hand out the prestigious Data Streaming awards.…but that’s not all.

Lightning Talks

A New UI for Kafka Connect: Your Favorite IDE!

  • Greg Harris, Aiven

We’ll introduce the LSP, how it enables simple development of cross-cutting IDE features, and how we’ve adapted the LSP to handle Connector Configurations.

Accurately Backtesting Real-time Streaming Features

  • Tony Wang, Stanford

In this talk, I will demonstrate that it is impossible to accurately answer this question. Approaches such as Flink's batch mode are not able to accurately handle late events and indicate when in processing time a window closed. In fact, they practically guarantee feature leakage.

CI/CD patterns for dbt Projects

  • Marta Paes, Materialize

If you’ve ever wondered how to integrate dbt into your CI/CD processes (think automatic project linting, spinning up testing environments, parsing the manifest file), this session is for you!

Community Gardening: Lessons from Open Source Interactions

  • David Handermann, Cloudera

This session offers lessons learned from contributing to open source projects, highlighting ways to engage regardless of technical expertise or engineering background.

Flavors of HA

  • Sreeram Ramji, Robinhood

This talk discusses how we removed SPoF through investments in Kafka infrastructure and our client libraries, letting us support a multitude of requirements of various systems inside robinhood.

Fluvii: A Lightweight Kafka Streams Client for Python

  • Tim Sawicki, Quix

In this talk, we will briefly explore Fluvii via its feature set and some simple examples so that you can confidently get started with it!

Kafka Latency Analyzer: Get Insights into Per-record, End-to-end Latency

  • Pavan Keshavamurthy , Platformatory

In this session, we shall cover kafka-latency-analyzer, a script that can use configurable timestamps to produce quantile reports of latency (focused on average and tail) at a topic level and optionally relay such results downstream to another kafka topic for analysis

kash.py - How to Make Your Data Scientists Love Real-time

  • Ralph Matthias Debusmann, Migros

In this session, we show you how kash.py can be used to bring the two disparate worlds of files and streaming together - and thus not only save a lot of time and money hiring real-time and streaming experts, but also make your data scientists, like ours, start loving real-time.

Moving Towards Better Upgrades in Kafka Streams

  • Whitney Steward, Confluent
  • Russ Katz

This talk will encourage movement while we learn to help bolster cognitive retention and offer a break from normal passive listening. So come learn from two Confluent CSTAs in the industry and make kstreams upgrade strategies muscle memory.

Visualizing the Stream

  • Rick Jacobs, Imply

This talk will explore how Confluent and Imply can be used to visualize streaming data in real-time. We will discuss visualization tools and options including line charts, bar charts, heat maps, geospatial, and scatter plots, and how they can be used to help humans understand data.

Seek and Destroy Kafka Under Replication

  • Edoardo Comar , IBM

This session wants to show how we successfully measured and evolved our Kafkas configuration, with the goal of giving the best possible user experience (and resilience to their data).

Six Different Things You Can Do In Kafka With Geo-Replication

  • Luke Knepper, Confluent

Move it, share it, bridge it, stage it, backup it, optimize it, bop it. Did you know you can do these things with geo-replication in Kafka? (well, except bop it)