Transform to a Modern Tech Stack with Cloud-Native Microservices (w/ Demo) | Register

Current 2022

View sessions and slides from Current 2022


Apache Kafka: Past, Present, & Future

  • Jun Rao, Confluent
  • Pritha Mehra, The United States Postal Service
  • Chunyan Wang, Pinterest

In this keynote, Jun Rao will focus on the community and ecosystem that powers Kafka, the current state of the project and recognize recent contributions. We’ll hear how devs and organizations are using Kafka in their businesses and dive deep into what’s coming.

Reimagining Data Pipelines for the Streaming Era

  • Chad Verbowski, Confluent
  • Erica Schultz, Confluent
  • Greg DiMichillie, Confluent
  • Andrew Hartnett, New Relic

Join Confluent executives in this keynote to learn more about the fundamental principles to reinvent data pipelines, so you can rapidly access high-quality, ready-to-use data for your real-time use cases. Hear about the launch of Stream Designer; an innovation in Confluent Cloud.

Welcome to the Streaming Era

  • Jay Kreps, Confluent
  • Gian Merlino, Imply
  • Anush Kumar, Expedia Group

In this keynote, CEO & Cofound of Confluent, Jay Kreps, joined by fellow industry leaders, will dive into the emergence of data streaming as a full category that, while still having Kafka at its core, has expanded into a broad and growing ecosystem of data movement and real-time technologies.

Breakout Sessions

A Better Kafka Connect With Kubernetes

  • Stefan Sprenger, DataCater
  • Hakan Lofcali, DataCater

This talk proposes a novel, cloud-native deployment model for Kafka Connect, which uses the different concepts of Kubernetes for executing, scaling, and isolating single Kafka Connect connectors. In a nutshell, we build unique container images for each Kafka Connect connector type.

A Crash Course in Designing Messaging Apis

  • Jack Vanlightly, Confluent

In this talk we're going to look at a variety of different messaging APIs, contrasting their features and guarantees with their ""heaviness"".

Advancing Apache Nifi Framework Security

  • David Handermann, Cloudera

This presentation covers the implementation details involved with automatic certificate generation, password-based key derivation, JSON Web Token signing, repository encryption, and sensitive property management using external services.

An Analytics Engineer’s Guide to Streaming

  • Amy Chen, dbt Labs

In this talk, we will explore what streaming in a batch-based analytics world should look like. How does that change your thoughts about implementing testing and performance optimization in your data pipelines? Do you still need dbt?

Apache Kafka With Spark Structured Streaming

  • Emma Liu, Databricks
  • Nitin Saksena, Albertsons
  • Ram Dhakne, Confluent

In this talk, you will learn:

  • The built-in streaming capabilities of a lakehouse
  • Best practices for integrating Kafka with Spark Structured Streaming
  • How Albertsons architected their data platform for real-time data processing and real-time analytics

Azure Event Hubs - Behind the Scenes

  • Kasun Indrasiri, Microsoft

This session is a look behind the curtain where we dive deep into the architecture of Event Hubs and look at the Event Hubs cluster model, resource isolation, and storage strategies and also review some performance figures.

Bootiful Kafka: Get the Message!

  • Josh Long, VMWare

Spring Boot and Apache Kafka are leaders in their respective fields and it's no surprise that they work well together. Join me, Spring Developer Advocate Josh Long and we'll look at how to use Spring Boot and Apache Kafka to build better, scalable systems and services.

Breathe In, Breathe Out: Get Kafka Connect Configs Right!

  • Francesco Tisiot, Aiven

We'll talk about streaming data into topics, the data formats to use and what to look out for when Kafka Connect is plugging data from another platform into your setup. Since we don't live in a perfect world, we'll also cover configurations like error tolerance, dead letter queues.

Buckle Up! Field Notes for Transitioning Your Daily Batch Jobs into Realtime Architecture

  • Valerie Burchby, Netflix
  • Xinran Waibel, Netflix

In this session, we will demystify operational complexity of event streaming in the real data engineering world and share best practices learned from developing and maintaining web-scale data systems at Netflix.

Building a Data Driven Culture and AI Revolution

  • Gregory Little, Department of Defense

In this session, Greg will discuss what it will take to guide the evolution of technology and culture in parallel: leadership, technology that enables rapid scale and a complete & reliable data flow, and a data driven culture.

Building a Data Streaming Center of Excellence

  • Steve Gonzalez, Confluent
  • Derek Kane, Confluent

This talk explores a solution to overcome common roadblocks and delays to realizing value at your organization - building a Data Streaming Center of Excellence (CoE). We will discuss the keys to success including workstreams and services required of a CoE, repeatable standards and guidance and more.

Building an Interactive Query Service in Kafka Streams

  • Bill Bejeck, Confluent

In this talk, I'll discuss and demonstrate what's needed to build an RPC mechanism between Kafka Stream instances, including:

  • The background of Interactive Queries
  • Using Spring Boot to expose your Interactive Query Service
  • How to route queries between app instances.

Building Real-Time Serverless Data Applications

  • Joseph Morais, Confluent
  • Adam Wagner, Amazon Web Services

Join this session to see first hand how developers are pairing Confluent's cloud native, serverless Apache Kafka offering with AWS's serverless services to build data apps and platform that scale.

CDC Stream Processing With Apache Flink

  • Timo Walther, Immerok

In this talk, we highlight what it means for Apache Flink to be a general data processor that acts as a data integration hub. Looking under the hood, we demonstrate Flink's SQL engine as a changelog processor that ships with an ecosystem tailored to processing CDC data and maintaining materialized.

Challenges, Objections, and the Future of Streaming

  • Eric Sammer, Decodable

This talk explores the current state of streaming, the most common objections and the reasons behind them, the massive technical and financial drag this has created, and what needs to change before streaming becomes the default way we process continuous data.

Chaos Engineering and How to Manage Data Stages

  • Adi Polak, Treeverse

A complex data flow is a set of operations to extract information from multiple sources, copy them into multiple data targets while using extract, transformations, joins, filters, and sorts to refine the results.

Choosing the Right Streaming Protocol

  • Sami Ahmed, Confluent
  • Amanda Gilbert, Confluent

In this session, we will set the stage by talking about the strengths and weaknesses of each protocol, and then dive into how Kafka can be leveraged with these different protocols. We will demo different approaches you might take.

Considerations for Abstracting Complexities of a Real-Time ML Platform

  • Zhenzhong Xu , Claypot AI

In this talk, we’ll discuss why ML platforms can benefit from a simple and ""invisible"" abstraction. We’ll offer some evidence on why you should consider leveraging streaming technologies even if your use cases are not real-time yet.

Dashing off a Dashboard: Livecoding a Kafka App

  • Kris Jenkins

We'll start with an empty directory and by the end, you'll have all the foundational pieces of a dashboard that could serve KPIs to everyone in your organisation, or just form the basis of your next lunchtime hacking session.

Data Governance as a Service

  • Vanessa Burckard, Social Security Administration

Learn how our approach of Data Governance as a Service to our customers will help us get ahead of the curve to helps streamline Kafka adoption for new use cases and build a reliable Enterprise Data Mesh as we go.

Deep Dive Into Kafka Tiered Storage

  • Satish Duggana, Uber

This talk dives into the internals of tiered storage in how we achieve those semantics covering scenarios like new brokers bootstrapped, or brokers having hard failures, or other out-of-sync brokers becoming leaders etc.

Designing Apache Hudi for Incremental Processing

  • Vinoth Chandar, Apache Software Foundation
  • Ethan Guo, Onehouse

In this session, we first introduce Apache Hudi and the key technology gaps it fills in the modern data architecture. Bridging traditional data lakes and warehouses, Hudi helps realize the Lakehouse vision, by bringing transactions, optimized table metadata to data lakes

Don’t Forget About Your Past—Optimizing Apache Druid Performance

  • Neil Buesing, Kinetic Edge

Let’s start with how to run Apache Druid locally with your containerized-based development environment. While streaming real-time events from Kafka into Druid, an S3 Complaint Store captures messages via Kafka Connect, for historical processing.

Event Driven Infrastructure as Software

  • Lee Briggs, Pulumi

In this talk, we'll take a high-level look at how infrastructure management has evolved, examine some insights from both sides of the DevOps divide and look at how your organisation could look if you want to create an event-driven infrastructure that was also managed like software.

Event Streaming in Academia

  • John DesJardins, Hazelcast

The talk will cover the systematic review workflow and obtained results from the academic literature. It will demonstrate best practices of event streaming and real-time applications in academia and research communities using Google Scholar for scholarly literature search.

Evolving Schemas Without Schema Evolution

  • Andreas Evers, KOR Financial

Upcaster chains allow you to read an old version of a message and bring it to what your logic needs today. The upcasters in the chain describe how to jump from one version to the next. They describe what your logic expects instead of covering all the possible variations that were ever published.

Extending the Apache Kafka® Replication Protocol Across Clusters

  • Sanjana Kaundinya, Confluent

In this talk, we will go over how you can use the existing replication protocol across clusters. You will learn how to use Cluster Linking to run a multi-region data streaming deployment without the burden and operational overhead of running yet another data system.

Fan-in Flames: Scaling Kafka to Millions of Producers

  • Ryanne Dolan , LinkedIn

This talk discusses a few real-world applications where high fan-in becomes a problem, and presents a few strategies for dealing with it.

From Monoliths to Microservices - A Journey With Confluent

  • Gayathri Veale, Indeed

If you’re in discussions surrounding engineering platforms at your organization then this talk is for you. If you are a data driven engineering organization with solid leadership with sound decisions behind it, join us for this talk and let’s have a discussion.

Getting More From Your Data

  • Kal Yella, Microsoft
  • Luciano Moreira, Microsoft
  • Jacob Bogie, Confluent

Join Microsoft’s Kal Yella, Luciano Moreira, and Confluent’s Jacob Bogie to learn how you can connect multi-cloud and hybrid data to Azure cloud, reducing the complexity and cost associated with building real-time applications and analytics in the cloud.

Getting Started With Spark Structured Streaming

  • Dustin Vannoy, Dustin Vannoy Consulting

This session shares techniques for data engineers who are new to building streaming pipelines with Spark Structured Streaming. It covers how to implement real-time stream processes with Apache Spark and Apache Kafka.

GitOps for Event-Driven Architecture -- Kube-Style!

  • Duncan Doyle, Red Hat

In this session, we will show how KCP can be used to transform the way you deploy, manage and maintain your event streaming application architecture, topology and deployments.

Going Multiplayer With Kafka

  • Ben Gamble, Aiven

Today we’ll walk through building multi-user and multiplayer spaces for games, collaboration, and for creation, leveraging Apache Kafka® for state management, and stream processing to handle conflicts and atomic edits.

High Performant Multi Resource Transaction

  • Kallol Duttagupta, Morgan Stanley
  • Arun Maroli, Morgan Stanley

In this talk we will describe how we addressed each one of these challenges to deliver a modernized, real time trade settlement solution giving attendees the information they need to tackle event driven architecture in the financial data space.

How Kafka Powers a Popular Vector Database System

  • Charles Xie, Zilliz
  • Frank Liu, Zilliz

We will walk through the challenges of unified streaming and batching in vector data processing, as well as the design choices and the Kafka-based data architecture.

How Netflix Manages $18B of Content Spend

  • Brian Orth, Netflix
  • David Johnson, Netflix

As a business, how does Netflix ensure that our forecasted spend is accurate? How do we enable systems and business processes to be able to move in a highly aligned, loosely coupled way that is so critical to the Netflix Culture?

How to Design a Kafka Architecture Resilient to Cloud Outages

  • Julie Wiederhold, Confluent

In this talk, we’ll discuss these in-depth, along with questions you should ask yourself to guide you to the architecture that solves your business needs.

HTTP/2 Streaming APIs for Full Stack Real-Time Applications

  • Chris Sachs,

We’ll demonstrate real-time maps that dynamically stream the live state of thousands of real-world entities, while only streaming what’s actually visible on screen at any given time. And we’ll close with a whirlwind tour of UX design patterns that showcase how streaming APIs can create live windows.

If Streaming Is the Answer, Why Are We Still Doing Batch?

  • Adi Polak, Treeverse
  • Tyler Akidau, Snowflake
  • Amy Chen, dbt Labs
  • Eric Sammer, Decodable

This panel brings together industry experts with decades of experience building and implementing data systems—both batch and streaming. In a pragmatic look at the landscape, they'll discuss the state of streaming adoption today, if streaming will ever fully replace batch—and indeed.

Implementing End-To-End Tracing

  • Roman Kolesnev, Confluent
  • Antony Stubbs, Confluent

This talk will walk through how to use and extend OpenTelemetry Java agent auto instrumentation to achieve full end-to-end traceability in Kafka event streaming architectures involving multi-cluster deployments, the Connect platform, stateful KStream applications and ksqlDB workloads.

Improving the Reliability of Market Data Subscription Feeds

  • Ruchir Vani, Nasdaq

In this talk we will discuss those challenges and introduce the Nasdaq Cloud Data Service SDK, an Open Source library for Kafka Consumers that tackles these issues and allows for uniform resilience, performance and operations among varied client configurations.

Introducing KRaft: Kafka Without Zookeeper

  • Colin McCabe, Confluent

Apache Kafka without Zookeeper is now production ready! This talk is about how you can run without ZooKeeper, and why you should.

Introduction to Apache Pinot

  • Tim Berglund, StarTree

In this talk, you'll learn how Pinot is put together and why it performs the way it does. You'll leave knowing its architecture, how to query it, and why it's a critical infrastructure component in the modern data stack, particularly in combination with architecture based on Kafka.

Kafka Client-Broker Interactions – What You Don’t See

  • Tom Bentley, Red Hat

Following this talk you’ll know how the Kafka client protocols work in detail and be able to tell your leaders from coordinators! The next time you have a problem you will not only be able to debug it more easily but also understand how to best utilize the Kafka protocol for your applications.

Keep Your Cache Always Fresh With Debezium!

  • Gunnar Morling, Decodable

Join us for this session to learn how to keep read views of your data in distributed caches close to your users, always kept in sync with your primary data stores change data capture.

Keepin’ It Real(-Time)

  • Nadine Farah, Rockset

In this tech talk, we’ll cover these aforementioned considerations in detail. We’ll show you how to build a SQL-based, real-time recommendation engine and customer 360 data application using Kafka, Rockset, and Retool.

Key Metrics To Uncover the Root Cause of Kafka Performance Anomalies

  • Daniel Kim, New Relic
  • Antón Rodríguez, New Relic

In this talk, we will take a close look at Kafka’s architecture as well as the key infrastructure, JVM, and system metrics you should monitor for each of its components. Then, we will walk through how to diagnose common Kafka performance anomalies through observing patterns in the metrics.

Knock Knock, Who’s There?

  • Justin Chen, Shopify
  • Dhruv Jauhar, Shopify

Previously at Shopify, a single SSL certificate was used by nearly all clients to connect to our Kafka clusters. As Kafka distinguishes users based on their certificate’s subject, all clients were masked as the same user, and thus we were unable to identify who was connecting.

Koala Counting With Kafka

  • Simon Aubury, Simple Machines

This project is a demonstration of using a Raspberry Pi and camera, Apache Kafka, Kafka Connect to identify and classify animals. Stream transformation performed using ksqlDB processes the individual animal observations to generate dashboards to understand population trends over time.

Let's Monitor The Conditions at the Conference

  • Timothy Spann, StreamNative
  • David Kjerrumgaard, StreamNative

Let's bring this to the different spots around the conference including lunch tables, vendor booths, hotel rooms, and more. I need to know about these readings now, not when I get back home from the conference.

Many Sources, Many Sinks, One Stream

  • Joel Eaton, Red Hat

In this session we’ll introduce the concept of the Canonical Stream, an ordered, declarative event stream of information about a thing that exists in the real world, with its own context and governance. The Canon is technology agnostic, and data context agnostic.

Mitigating One Million Security Threats With Kafka and Spark

  • Arun Janarthnam, Citrix

In this session, we will talk about how, in the last 6 months, 7M risk indicators were triggered and 1M threat mitigating actions were taken, and the integral role Kafka played in achieving it. We would also like to share some interesting ways Kafka is used at Citrix.

Modern Data Flow: A Better Way of Building Data Pipelines

  • Andrew Sellers, Confluent

In this session we'll review the Modern Data Flow principles, and discuss them in the context of trends in the data landscape and modern software engineering practices.

Navigating Your Data Landscape

  • Siddharth Desai, Google Cloud
  • Elena Cuevas, Confluent

In this session, learn how organizations can unlock data value using best-in-class, cloud native products on Google Cloud and its partners such as Confluent.

Next Gen Data Modeling in the Open Data Platform

  • Doron Porat, Yotpo
  • Liran Yogev, Ziprecruiter

In this talk, we'll share from our journey redesigning the data lake, and how to best address organizational needs, without having to give up on high-end tooling and technology. We are taking this to the next level.

Off the Chain: Scaling Blockchain Data With Kafka

  • Jan Svoboda, Confluent
  • Alex Stuart, Confluent

This session will explain how slow data on the blockchain can be joined together with fast data in Kafka and published out to other systems. Jan and Alex (two of Confluent’s resident crypto fans) will walk through a prototype of a distributed blockchain application.

OH: That microservice should have been a SQL query

  • Seth Wiesman, Materialize Inc

This talk will provide a hands-on look at Materialize and show how it can be used to simplify your application development.

One Year In – Lessons Learned and Plans for the Future

  • Robert Ezekiel, Booz Allen Hamilton

To improve on the speed of benefits and services delivered at the Veterans Affairs (VA), we implemented Kafka last year with a few products in production. In our talk, we will talk through some of the challenges and lessons learned from adopting an event driven architecture.

Optimizing for Low Latency and High Throughput

  • Artem Livshits, Confluent

In this talk I'll cover a simple, but effective algorithm for auto-tuning effective batch size for low latency and high throughput, adaptive partitioning logic to direct more data to faster brokers, and go through benchmark results that illustrate effectiveness of the new Sticky Partitioner.

Practical Pipelines: A Houseplant Soil Alerting System With Ksqldb

  • Danica Fine, Confluent

In this session, I’ll talk about how I ingest the data, followed by a look at the tools, including ksqlDB and Kafka Connect, that will help transform the raw data into useful information.

Processing Kafka Data in Real-Time With Ksqldb

  • Michael Drogalis, Confluent

In this talk, we’ll step through the basics of stream processing through ksqlDB, a Kafka-native, SQL-based stream processor. You’ll learn about its core abstractions, how it works, and how you can use it to build modern data pipelines.

Put a Topic on It

  • Mitch Gitman, T-Mobile

In this talk, I'll explain what we call inbound and outbound Kafka topics and use those concepts as the launching pad to discuss:

  • The importance of separating data capture from data processing.
  • The power of Kafka as a circuit breaker.

Real-Time Inter-Agency Data Sharing With Kafka

  • Rob Brown, US Citizenship and Immigration Services

US Government agencies are required to share large volumes of data to enable them to execute on their critical missions. Sharing data across agencies is required for implementing US immigration and naturalization processes, issuing passports and Visas.

Real-Time Processing of Spatial Data Using Kafka Streams

  • Ian Feeney, Confluent
  • Roman Kolesnev, Confluent

In this talk, we will first set the scene with a geospatial 101. Then, using a simplified taxi hailing use case, we will look at two approaches for processing spatial data with Kafka Streams.

Reimagining Customer Experiences With Confluent

  • Phani Bhattiprolu, Slower
  • Ram Dhakne, Confluent

In this session we will showcase how Confluent and Slower partner together to help customers overcome challenges and realize the true value of Confluent Cloud.

Rethinking State Management in Cloud-Native Streaming Systems

  • Yingjun Wu, RisingWave Labs

Stream processing is becoming increasingly essential for extracting business value from data in real-time. To achieve strict user-defined SLAs under constantly changing workloads, modern streaming systems have started taking advantage of the cloud for scalable and resilient resources.

Running production CDC ingestion pipelines at scale in Robinhood

  • Balaji Varadarajan, Robinhood
  • Pritam K Dey, Robinhood

In this talk, we will describe the evolution of change data capture based ingestion in Robinhood not only in terms of the scale of data stored and queries made, but also the use cases that it supports. We will go in-depth into the CDC architecture built around our Kafka ecosystem.

Running Thousands of Kafka Clusters on AWS

  • Mehari Beyene, Amazon Web Services
  • Tom Schutte, Amazon Web Services

We’ll talk about several topics including (a) monitoring Kafka health, (b) optimizing Kafka to address compute, storage and networking bottlenecks, (c) automating detection and mitigation of infrastructure failures related to compute, storage and networking and (d) continuous software patching.

Speed Up Your Kubernetes Upgrades for Your Kafka Clusters

  • Vanessa Vuibert, Shopify

I will go over how to stretch a Kafka cluster across the old and new Kubernetes clusters without adding any extra brokers. Finally, I will discuss how the Kafka brokers in the new Kubernetes cluster get scaled up while the old one gets decommissioned.

SQL Extensions To Support Streaming Data

  • Fabian Hueske, Snowflake

This talk will look at: o Why is this happening? o Who is involved? o How does the process work? o What progress has been made? o When can we expect to see a standard?

Streaming 101 Revisited: A Fresh Hot Take

  • Tyler Akidau, Snowflake
  • Dan Sotolongo, Snowflake

This talk will cover the key concepts of stream processing theory as we understand them today. It is simultaneously an introductory talk as well as an advanced survey on the breadth of stream processing theory. Anyone with an interest in streaming should find something engaging within.

Streaming Data Into Your Lakehouse

  • Frank Munz, Databricks

This talk is for data architects who are not afraid of some code and for data engineers who love open source and cloud services.

Streaming SQL for Data Engineers: The Next Big Thing?

  • Yaroslav Tkachenko, Goldsky

In this presentation, I hope to share the discoveries I made over the years in this area, as well as working practices and patterns I’ve seen.

Streaming Time Series Data

  • Kenny Gorman, MongoDB
  • Elena Cuevas, Confluent

In this talk, Kenny Gorman and Elena Cuevas will present how Apache Kafka on Confluent Cloud can stream massive amounts of data to Time Series Collections via the MongoDB Connector for Apache Kafka.

Team Collaboration in Kafka Clusters

  • Maria Berinde-Tampanariu, Confluent

What are the options offered by the Kafka built-in Authorizer, how can the Authorizer be customized and how are integrations with external systems built in order to provide group or role-based access control?

Testing Kafka Containers With Testcontainers: There and Back Again

  • Viktor Gamov , Kong

In this session, Viktor talks about Testcontainers, a library (that was initially created for JVM, now exists in many languages) that provides lightweight, disposable instances of shared databases, clusters, and anything else that can run in a Docker container!

The End of Big Data

  • Benn Stancil, Mode

In this talk, I’ll share why the next wave of successful data companies will follow the same pattern. Rather than trying to change how we work, they’ll find ways to unambiguously improve it.

The Metamorphosis of Database Changes

  • Tim Steinbach, Shopify

This talk describes our journey of ingesting multiple Kafka data streams from thousands of topics and about half a million partitions, storing Apache Iceberg datasets and explaining the issues along the way.

The Next Generation of the Consumer Rebalance Protocol

  • David Jacot, Confluent

This talk will unveil the next generation of the consumer rebalance protocol for Apache Kafka (KIP-848) that addresses the shortcomings of the current protocol. We will go through the evolution of the current rebalance protocol, discuss its shortcomings, and present the new rebalance protocol.

The Shitposting AI

  • Thomas Endres, TNG Technology Consulting GmbH
  • Jonas Mayer, TNG Technology Consulting GmbH

In this talk, we will give an introduction to NLP, focussing on the concepts of STT, Text Generation and TTS. Using live demos, we will guide you through the process of scraping social media comments, training a text generation model, synthesizing millions of voices and building IoT robot heads.

Towards Client-Side Field-Level Cryptography

  • Hans-Peter Grahsl, Red Hat

During this demo-driven talk, you will experience how to benefit from

  • a configurable single message transformation (SMT) that lets you perform encryption and decryption operations in Kafka Connect worker nodes without any additional code

Unbundling the Modern Streaming Stack

  • Dunith Dhanushka, Redpanda

This talk first explores the ""classic streaming stack,"" based on the Lambda architecture, its origin, and why it didn't pick up amongst data-driven organizations. The modern streaming stack (MSS) is a lean, cloud-native, and economical alternative to classic streaming architectures.

Utilizing Point-in-Time Queries in Event-Based Systems

  • Bobby Calderwood, Evident Systems

In this talk, we'll discuss how the oNote team implemented a point-in-time queryable Event Model repository using Kafka, Git, and CRDTs. We'll also discuss some other technologies that facilitate this pattern.

Webscale Workflow Engine With Kafka

  • Andrey Falko , Salesforce

In this talk, we introduce a workflow engine concept that only uses Kafka to persist state transitions and execution results. The system banks on Kafka’s high reliability, transactionality, and high scale to keep setup and operating costs low.

Welcome to Kafka, We’re Glad You’re Here

  • Dave Klein, Tabular

I’ll take you through the basics of Kafka—the brokers, the partitions, the topics—and then on and up into the different APIs and tools available to work with it. Consider it a Kafka 101, if you will. We’ll stay at a high level, but we’ll cover a lot of ground.

What’s Slowing Down Your Kafka Pipeline?

  • Ruizhe Cheng, New Relic
  • Pete Stevenson, New Relic

In a live demo, we will introduce an eBPF-based, always-on, CPU profiler to visualize what your Kafka applications are spending time on. We will analyze how much time the Kafka broker spends on handling different requests and responding to polling.

What’s up With Availability in Kafka?

  • Justine Olshan, Confluent

Using Apache Kafka and Confluent Cloud as a case study, we will dig deeper into how to define good SLOs and SLAs for distributed systems. From there we will discuss ways to improve availability and the changes we made to Confluent Cloud to improve on Kafka's availability story.

When Kafka Is the Source of Truth

  • Ricardo Ferreira, Amazon Web Services

In this session, we will get into the weeds of data serialization with schemas. We will discuss the differences between formats like JSON, Avro, Thrift, and Protocol Buffers, and how your code must use each one of them to serialize data.

When Streaming Needs Batch

  • Konstantin Knauf, Confluent

In this talk, I'll introduce Apache Flink's approach to unified stream and batch processing and discuss - by example - how these scenarios can already be addressed today and what might be possible in the future.

Why Can’t the Business Get Behind Streaming?!

  • Becky Gandillon, Centric Consulting

During this session, you'll learn about how to communicate the value of technology decisions to non-technical co-workers or stakeholders. And we'll talk about some very specific buy-in, enablement, and adoption activities and suggestions for supporting streaming implementations.

Why Wait? Realtime Ingestion

  • Heng Zhang, Pinterest
  • Chen Qin, Pinterest

In this talk, we plan to share our near-real-time ingestion system built on top of Apache Kafka, Apache Flink, and Apache Iceberg. We pick ANSI SQL as the common currency to minimize the ""lambda architecture"" learning curve of teams adopting fresh data near-realtime data.

Wikipedia’s Event Data Platform, Or: JSON Is Okay Too

  • Andrew Otto, Wikimedia Foundation

This session will describe how and why we built Wikimedia's Event Data Platform using Kafka, JSON and JSONSchemas, and how we make our event data available to the world.

You Put *What* in Your Stream?! Patterns and Practices for Event Design

  • Adam Bellemare, Confluent

In this talk, Adam covers the main considerations of modeling and implementing events. Data is often modeled as a Fact or a Delta, though the distinction isn't always clear.

You’re Spiky and We Know It

  • Ravindra Bhanot, Twilio

This talk elaborates the challenges that Twilio faced when building such a monitoring platform, which can aggregate customer data and send alerts in a timely manner under SLA.

Zero Down Time Move From Apache Kafka to Confluent

  • Justin Dempsey, SAS Institute

This session details the journey for moving standalone Kafka to Kafka on K8S. During the session, scope of the journey including Total Cost of Ownership (TCO), technical architecture, and the migration itself will be discussed.

Lightning Talks

Apache Flink Adoption at Shopify

  • Kevin Lam, Shopify

In this talk, we go over the history and future of Apache Flink adoption at Shopify.

We’ll talk about how and why we went from choosing Apache Flink as the replacement for our existing streaming technologies in 2021, to a year later with a flourishing streaming community.

Balance Your Data Across Apache Kafka Partitions

  • Olena Kutsenko, Aiven

In this talk we'll discuss mechanisms you can use to balance your data, such as keys, composite message key, role of hashing, custom partitions and other things you need to keep in mind when splitting data across partitions.

Build a Streaming Graph Pipeline on Kafka With Quine

  • Ryan Wright, thatDot

In this live-coding lightning talk, we'll start from scratch and build a streaming graph data pipeline from start to finish. With our data in Kafka, Quine plugs in and requires just a graph query written in the Cypher graph query language.

Building a Highly Reliable Enterprise Infrastructure

  • Grace Zhang, Citigroup

We will share how we: -- drive the data streaming readiness by standardizing Kafka clusters among divergent payment application demands. -- overcome the challenge of designing and implementing Kafka enterprise infrastructure to meet business requirements

Dead Letter Queues for Kafka Consumers in Robinhood

  • Sreeram Ramji, Robinhood
  • Wenlong Xiong, Robinhood

This talk discusses how we built libraries, templated micro services and tooling that leverages Postgres and Kafka for safely dealing with dead letters, inspecting and querying them, and republishing them to retry kafka topics for safe reprocessing at a later time.

Designing a Feedback Loop for Event-Driven Data Sharing

  • Teresa Wang, Jet Propulsion Laboratory

In this talk, we will discuss how we overcame these challenges and delivered a fully automated and robust data exchange solution by extending Kafka Connect, leveraging ksqlDB streams/tables and aggregations, and developing custom microservices.

Designing Topic Structures for Data Resiliency and Disaster Recovery

  • Justin Lee, Confluent

In this talk, we'll discuss the actual implementation details for the clients and topics that live in multi-cluster environments, including: What naming conventions and patterns should be followed for topics in a multi-cluster architecture? How does this differ between application?

Ducktape: Keeping System Testing Simple in a Distributed World

  • Ian McDonald, Confluent

In this talk, we will go over how Ducktape solves the problem of multi-service distributed testing, what type of testing it is designed for, and how it simplifies the testing experience for complex real time systems. Get ready to get your hands dirty and learn how to write a test and a service.

Getting Advertiser Budget Just Right at Reddit

  • Sundeep Yedida, Reddit
  • Nagalakshmi Ramasubramanian

In this talk, we will learn how we leveraged Kafka and Druid to provide real-time aggregations of spend against both daily and lifetime budgets. This led to significant decreases in overdelivery compared to the previous batch system, and savings of $LARGE_NUMBER_OF_DOLLARS

Handing Failure With Grace in Kafka Streams

  • Walker Carlson, Confluent

In this talk, we will cover the changes to the threading model that made more dynamic error handling possible. We will also introduce the Streams handler, which unlocked options to react immediately in cases that would previously cause cascading thread death.

How PubSub Helped Build Vox Media’s Data Applications

  • Movses Musaelian, Vox Media

This talk will discuss practical tips for architecting and productionalizing scalable and latent data applications that leverage the PubSub model. Attendees will learn about common data messaging capabilities found through the PubSub model and how to leverage PubSub to optimize the performance

How Snowflake Sink Connector Uses Snowpipe’s Streaming Ingestion Feature

  • Jay Patel, Snowflake

We’ll discuss streaming ingestion into Snowflake with Snowpipe Streaming and how we utilized it with the Snowflake Sink Connector for Kafka. We will talk about the improvements and then jump onto a demo which uses Docker containers to spin up a Kafka and Kafka connect environment to load data

Lowering the Barrier to Stream Processing

  • Alex Morley, Babylon Health

We found that by using the ""agent"" concept in faust we could provide our engineers with a ""Function as a Service""-like experience specifically for processing events on Kafka streams.

Monitoring Exascale Supercomputers

  • Tim Osborne, Oak Ridge National Lab

In this talk we will discuss scaling and planning a system to meet the streaming demands of the world’s only exascale and most energy efficient supercomputer. Tune in to learn more about HPC and how streaming fits in to monitoring large-scale systems.

Online Machine Learning on Streaming Data With River and Bytewax

  • Zander Matheson, Bytewax

In this session we will look at how to leverage the Python libraries River and Bytewax to build streaming applications on Kafka that use online machine learning techniques.

Running Kafka On Pi4/ARM

  • Jeffrey Needham, Confluent

This talk provides a work-in-progress update of deploying Kafka on aarch64 Linux. Although the new Apple M1 is ARMv8 based, it has a distinct flavor, or ELF format - arm64. Since much of Kafka consists of noarch rpms, or simply, a bag-o-jars, both Linux and macOS have native implementations of Java

Successful Fraud Detection in Real-Time MMORPG Games

  • Abbey Kwak, Kakao

By centralizing the logs that occur in actual game specially MMORPG game, and by detecting and operation anomalies through about more than 300 patterns through KsqlDB, and sharing the know-how gained with game operation

Tactical Virtual Assistance (TVA)

  • Jubal Biggs, SAIC

How DOD can manage the military battlefield assets to include integrate signals from a diverse and dynamic set of sensors, including static ground sensors and soldiers worn sensors to provide predictive and operational analytics?

Under the Covers: Segments of Apache Kafka

  • Kirill Kulikov, Confluent

In this presentation, we are going to deep dive into the internals of Kafka log mechanisms. We will look in detail at the structure of the commit-log and segments, topic partitions arrangement on disk, log retention for compact and delete policies.

Verifying Apache Kafka-Based Data Pipelines

  • Subhangi Agarwala, Bloomberg

In this talk, we aim to highlight the importance of integration testing, a critical verification method for stable and reliable large-scale distributed streaming applications. We will also provide a high level overview of our system, challenges faced in moving to a streaming infrastructure

When NOT to Use Apache Kafka?

  • Kai Waehner , Confluent

When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job? This session explores the DOs and DONTs.