View sessions and slides from Current 2022
In this keynote, Jun Rao will focus on the community and ecosystem that powers Kafka, the current state of the project and recognize recent contributions. We’ll hear how devs and organizations are using Kafka in their businesses and dive deep into what’s coming.
Join Confluent executives in this keynote to learn more about the fundamental principles to reinvent data pipelines, so you can rapidly access high-quality, ready-to-use data for your real-time use cases. Hear about the launch of Stream Designer; an innovation in Confluent Cloud.
In this keynote, CEO & Cofound of Confluent, Jay Kreps, joined by fellow industry leaders, will dive into the emergence of data streaming as a full category that, while still having Kafka at its core, has expanded into a broad and growing ecosystem of data movement and real-time technologies.
This talk proposes a novel, cloud-native deployment model for Kafka Connect, which uses the different concepts of Kubernetes for executing, scaling, and isolating single Kafka Connect connectors. In a nutshell, we build unique container images for each Kafka Connect connector type.
In this talk we're going to look at a variety of different messaging APIs, contrasting their features and guarantees with their ""heaviness"".
This presentation covers the implementation details involved with automatic certificate generation, password-based key derivation, JSON Web Token signing, repository encryption, and sensitive property management using external services.
In this talk, we will explore what streaming in a batch-based analytics world should look like. How does that change your thoughts about implementing testing and performance optimization in your data pipelines? Do you still need dbt?
In this talk, you will learn:
This session is a look behind the curtain where we dive deep into the architecture of Event Hubs and look at the Event Hubs cluster model, resource isolation, and storage strategies and also review some performance figures.
Spring Boot and Apache Kafka are leaders in their respective fields and it's no surprise that they work well together. Join me, Spring Developer Advocate Josh Long and we'll look at how to use Spring Boot and Apache Kafka to build better, scalable systems and services.
We'll talk about streaming data into topics, the data formats to use and what to look out for when Kafka Connect is plugging data from another platform into your setup. Since we don't live in a perfect world, we'll also cover configurations like error tolerance, dead letter queues.
In this session, we will demystify operational complexity of event streaming in the real data engineering world and share best practices learned from developing and maintaining web-scale data systems at Netflix.
In this session, Greg will discuss what it will take to guide the evolution of technology and culture in parallel: leadership, technology that enables rapid scale and a complete & reliable data flow, and a data driven culture.
This talk explores a solution to overcome common roadblocks and delays to realizing value at your organization - building a Data Streaming Center of Excellence (CoE). We will discuss the keys to success including workstreams and services required of a CoE, repeatable standards and guidance and more.
In this talk, I'll discuss and demonstrate what's needed to build an RPC mechanism between Kafka Stream instances, including:
Join this session to see first hand how developers are pairing Confluent's cloud native, serverless Apache Kafka offering with AWS's serverless services to build data apps and platform that scale.
In this talk, we highlight what it means for Apache Flink to be a general data processor that acts as a data integration hub. Looking under the hood, we demonstrate Flink's SQL engine as a changelog processor that ships with an ecosystem tailored to processing CDC data and maintaining materialized.
This talk explores the current state of streaming, the most common objections and the reasons behind them, the massive technical and financial drag this has created, and what needs to change before streaming becomes the default way we process continuous data.
A complex data flow is a set of operations to extract information from multiple sources, copy them into multiple data targets while using extract, transformations, joins, filters, and sorts to refine the results.
In this session, we will set the stage by talking about the strengths and weaknesses of each protocol, and then dive into how Kafka can be leveraged with these different protocols. We will demo different approaches you might take.
In this talk, we’ll discuss why ML platforms can benefit from a simple and ""invisible"" abstraction. We’ll offer some evidence on why you should consider leveraging streaming technologies even if your use cases are not real-time yet.
We'll start with an empty directory and by the end, you'll have all the foundational pieces of a dashboard that could serve KPIs to everyone in your organisation, or just form the basis of your next lunchtime hacking session.
Learn how our approach of Data Governance as a Service to our customers will help us get ahead of the curve to helps streamline Kafka adoption for new use cases and build a reliable Enterprise Data Mesh as we go.
This talk dives into the internals of tiered storage in how we achieve those semantics covering scenarios like new brokers bootstrapped, or brokers having hard failures, or other out-of-sync brokers becoming leaders etc.
In this session, we first introduce Apache Hudi and the key technology gaps it fills in the modern data architecture. Bridging traditional data lakes and warehouses, Hudi helps realize the Lakehouse vision, by bringing transactions, optimized table metadata to data lakes
Let’s start with how to run Apache Druid locally with your containerized-based development environment. While streaming real-time events from Kafka into Druid, an S3 Complaint Store captures messages via Kafka Connect, for historical processing.
In this talk, we'll take a high-level look at how infrastructure management has evolved, examine some insights from both sides of the DevOps divide and look at how your organisation could look if you want to create an event-driven infrastructure that was also managed like software.
The talk will cover the systematic review workflow and obtained results from the academic literature. It will demonstrate best practices of event streaming and real-time applications in academia and research communities using Google Scholar for scholarly literature search.
Upcaster chains allow you to read an old version of a message and bring it to what your logic needs today. The upcasters in the chain describe how to jump from one version to the next. They describe what your logic expects instead of covering all the possible variations that were ever published.
In this talk, we will go over how you can use the existing replication protocol across clusters. You will learn how to use Cluster Linking to run a multi-region data streaming deployment without the burden and operational overhead of running yet another data system.
This talk discusses a few real-world applications where high fan-in becomes a problem, and presents a few strategies for dealing with it.
If you’re in discussions surrounding engineering platforms at your organization then this talk is for you. If you are a data driven engineering organization with solid leadership with sound decisions behind it, join us for this talk and let’s have a discussion.
Join Microsoft’s Kal Yella, Luciano Moreira, and Confluent’s Jacob Bogie to learn how you can connect multi-cloud and hybrid data to Azure cloud, reducing the complexity and cost associated with building real-time applications and analytics in the cloud.
This session shares techniques for data engineers who are new to building streaming pipelines with Spark Structured Streaming. It covers how to implement real-time stream processes with Apache Spark and Apache Kafka.
In this session, we will show how KCP can be used to transform the way you deploy, manage and maintain your event streaming application architecture, topology and deployments.
Today we’ll walk through building multi-user and multiplayer spaces for games, collaboration, and for creation, leveraging Apache Kafka® for state management, and stream processing to handle conflicts and atomic edits.
In this talk we will describe how we addressed each one of these challenges to deliver a modernized, real time trade settlement solution giving attendees the information they need to tackle event driven architecture in the financial data space.
We will walk through the challenges of unified streaming and batching in vector data processing, as well as the design choices and the Kafka-based data architecture.
As a business, how does Netflix ensure that our forecasted spend is accurate? How do we enable systems and business processes to be able to move in a highly aligned, loosely coupled way that is so critical to the Netflix Culture?
In this talk, we’ll discuss these in-depth, along with questions you should ask yourself to guide you to the architecture that solves your business needs.
We’ll demonstrate real-time maps that dynamically stream the live state of thousands of real-world entities, while only streaming what’s actually visible on screen at any given time. And we’ll close with a whirlwind tour of UX design patterns that showcase how streaming APIs can create live windows.
This panel brings together industry experts with decades of experience building and implementing data systems—both batch and streaming. In a pragmatic look at the landscape, they'll discuss the state of streaming adoption today, if streaming will ever fully replace batch—and indeed.
This talk will walk through how to use and extend OpenTelemetry Java agent auto instrumentation to achieve full end-to-end traceability in Kafka event streaming architectures involving multi-cluster deployments, the Connect platform, stateful KStream applications and ksqlDB workloads.
In this talk we will discuss those challenges and introduce the Nasdaq Cloud Data Service SDK, an Open Source library for Kafka Consumers that tackles these issues and allows for uniform resilience, performance and operations among varied client configurations.
Apache Kafka without Zookeeper is now production ready! This talk is about how you can run without ZooKeeper, and why you should.
In this talk, you'll learn how Pinot is put together and why it performs the way it does. You'll leave knowing its architecture, how to query it, and why it's a critical infrastructure component in the modern data stack, particularly in combination with architecture based on Kafka.
Following this talk you’ll know how the Kafka client protocols work in detail and be able to tell your leaders from coordinators! The next time you have a problem you will not only be able to debug it more easily but also understand how to best utilize the Kafka protocol for your applications.
Join us for this session to learn how to keep read views of your data in distributed caches close to your users, always kept in sync with your primary data stores change data capture.
In this tech talk, we’ll cover these aforementioned considerations in detail. We’ll show you how to build a SQL-based, real-time recommendation engine and customer 360 data application using Kafka, Rockset, and Retool.
In this talk, we will take a close look at Kafka’s architecture as well as the key infrastructure, JVM, and system metrics you should monitor for each of its components. Then, we will walk through how to diagnose common Kafka performance anomalies through observing patterns in the metrics.
Previously at Shopify, a single SSL certificate was used by nearly all clients to connect to our Kafka clusters. As Kafka distinguishes users based on their certificate’s subject, all clients were masked as the same user, and thus we were unable to identify who was connecting.
This project is a demonstration of using a Raspberry Pi and camera, Apache Kafka, Kafka Connect to identify and classify animals. Stream transformation performed using ksqlDB processes the individual animal observations to generate dashboards to understand population trends over time.
Let's bring this to the different spots around the conference including lunch tables, vendor booths, hotel rooms, and more. I need to know about these readings now, not when I get back home from the conference.
In this session we’ll introduce the concept of the Canonical Stream, an ordered, declarative event stream of information about a thing that exists in the real world, with its own context and governance. The Canon is technology agnostic, and data context agnostic.
In this session, we will talk about how, in the last 6 months, 7M risk indicators were triggered and 1M threat mitigating actions were taken, and the integral role Kafka played in achieving it. We would also like to share some interesting ways Kafka is used at Citrix.
In this session we'll review the Modern Data Flow principles, and discuss them in the context of trends in the data landscape and modern software engineering practices.
In this session, learn how organizations can unlock data value using best-in-class, cloud native products on Google Cloud and its partners such as Confluent.
In this talk, we'll share from our journey redesigning the data lake, and how to best address organizational needs, without having to give up on high-end tooling and technology. We are taking this to the next level.
This session will explain how slow data on the blockchain can be joined together with fast data in Kafka and published out to other systems. Jan and Alex (two of Confluent’s resident crypto fans) will walk through a prototype of a distributed blockchain application.
This talk will provide a hands-on look at Materialize and show how it can be used to simplify your application development.
To improve on the speed of benefits and services delivered at the Veterans Affairs (VA), we implemented Kafka last year with a few products in production. In our talk, we will talk through some of the challenges and lessons learned from adopting an event driven architecture.
In this talk I'll cover a simple, but effective algorithm for auto-tuning effective batch size for low latency and high throughput, adaptive partitioning logic to direct more data to faster brokers, and go through benchmark results that illustrate effectiveness of the new Sticky Partitioner.
In this session, I’ll talk about how I ingest the data, followed by a look at the tools, including ksqlDB and Kafka Connect, that will help transform the raw data into useful information.
In this talk, we’ll step through the basics of stream processing through ksqlDB, a Kafka-native, SQL-based stream processor. You’ll learn about its core abstractions, how it works, and how you can use it to build modern data pipelines.
In this talk, I'll explain what we call inbound and outbound Kafka topics and use those concepts as the launching pad to discuss:
US Government agencies are required to share large volumes of data to enable them to execute on their critical missions. Sharing data across agencies is required for implementing US immigration and naturalization processes, issuing passports and Visas.
In this talk, we will first set the scene with a geospatial 101. Then, using a simplified taxi hailing use case, we will look at two approaches for processing spatial data with Kafka Streams.
In this session we will showcase how Confluent and Slower partner together to help customers overcome challenges and realize the true value of Confluent Cloud.
Stream processing is becoming increasingly essential for extracting business value from data in real-time. To achieve strict user-defined SLAs under constantly changing workloads, modern streaming systems have started taking advantage of the cloud for scalable and resilient resources.
In this talk, we will describe the evolution of change data capture based ingestion in Robinhood not only in terms of the scale of data stored and queries made, but also the use cases that it supports. We will go in-depth into the CDC architecture built around our Kafka ecosystem.
We’ll talk about several topics including (a) monitoring Kafka health, (b) optimizing Kafka to address compute, storage and networking bottlenecks, (c) automating detection and mitigation of infrastructure failures related to compute, storage and networking and (d) continuous software patching.
I will go over how to stretch a Kafka cluster across the old and new Kubernetes clusters without adding any extra brokers. Finally, I will discuss how the Kafka brokers in the new Kubernetes cluster get scaled up while the old one gets decommissioned.
This talk will look at: o Why is this happening? o Who is involved? o How does the process work? o What progress has been made? o When can we expect to see a standard?
This talk will cover the key concepts of stream processing theory as we understand them today. It is simultaneously an introductory talk as well as an advanced survey on the breadth of stream processing theory. Anyone with an interest in streaming should find something engaging within.
This talk is for data architects who are not afraid of some code and for data engineers who love open source and cloud services.
In this presentation, I hope to share the discoveries I made over the years in this area, as well as working practices and patterns I’ve seen.
In this talk, Kenny Gorman and Elena Cuevas will present how Apache Kafka on Confluent Cloud can stream massive amounts of data to Time Series Collections via the MongoDB Connector for Apache Kafka.
What are the options offered by the Kafka built-in Authorizer, how can the Authorizer be customized and how are integrations with external systems built in order to provide group or role-based access control?
In this session, Viktor talks about Testcontainers, a library (that was initially created for JVM, now exists in many languages) that provides lightweight, disposable instances of shared databases, clusters, and anything else that can run in a Docker container!
In this talk, I’ll share why the next wave of successful data companies will follow the same pattern. Rather than trying to change how we work, they’ll find ways to unambiguously improve it.
This talk describes our journey of ingesting multiple Kafka data streams from thousands of topics and about half a million partitions, storing Apache Iceberg datasets and explaining the issues along the way.
This talk will unveil the next generation of the consumer rebalance protocol for Apache Kafka (KIP-848) that addresses the shortcomings of the current protocol. We will go through the evolution of the current rebalance protocol, discuss its shortcomings, and present the new rebalance protocol.
In this talk, we will give an introduction to NLP, focussing on the concepts of STT, Text Generation and TTS. Using live demos, we will guide you through the process of scraping social media comments, training a text generation model, synthesizing millions of voices and building IoT robot heads.
During this demo-driven talk, you will experience how to benefit from
This talk first explores the ""classic streaming stack,"" based on the Lambda architecture, its origin, and why it didn't pick up amongst data-driven organizations. The modern streaming stack (MSS) is a lean, cloud-native, and economical alternative to classic streaming architectures.
In this talk, we'll discuss how the oNote team implemented a point-in-time queryable Event Model repository using Kafka, Git, and CRDTs. We'll also discuss some other technologies that facilitate this pattern.
In this talk, we introduce a workflow engine concept that only uses Kafka to persist state transitions and execution results. The system banks on Kafka’s high reliability, transactionality, and high scale to keep setup and operating costs low.
I’ll take you through the basics of Kafka—the brokers, the partitions, the topics—and then on and up into the different APIs and tools available to work with it. Consider it a Kafka 101, if you will. We’ll stay at a high level, but we’ll cover a lot of ground.
In a live demo, we will introduce an eBPF-based, always-on, CPU profiler to visualize what your Kafka applications are spending time on. We will analyze how much time the Kafka broker spends on handling different requests and responding to polling.
Using Apache Kafka and Confluent Cloud as a case study, we will dig deeper into how to define good SLOs and SLAs for distributed systems. From there we will discuss ways to improve availability and the changes we made to Confluent Cloud to improve on Kafka's availability story.
In this session, we will get into the weeds of data serialization with schemas. We will discuss the differences between formats like JSON, Avro, Thrift, and Protocol Buffers, and how your code must use each one of them to serialize data.
In this talk, I'll introduce Apache Flink's approach to unified stream and batch processing and discuss - by example - how these scenarios can already be addressed today and what might be possible in the future.
During this session, you'll learn about how to communicate the value of technology decisions to non-technical co-workers or stakeholders. And we'll talk about some very specific buy-in, enablement, and adoption activities and suggestions for supporting streaming implementations.
In this talk, we plan to share our near-real-time ingestion system built on top of Apache Kafka, Apache Flink, and Apache Iceberg. We pick ANSI SQL as the common currency to minimize the ""lambda architecture"" learning curve of teams adopting fresh data near-realtime data.
This session will describe how and why we built Wikimedia's Event Data Platform using Kafka, JSON and JSONSchemas, and how we make our event data available to the world.
In this talk, Adam covers the main considerations of modeling and implementing events. Data is often modeled as a Fact or a Delta, though the distinction isn't always clear.
This talk elaborates the challenges that Twilio faced when building such a monitoring platform, which can aggregate customer data and send alerts in a timely manner under SLA.
This session details the journey for moving standalone Kafka to Kafka on K8S. During the session, scope of the journey including Total Cost of Ownership (TCO), technical architecture, and the migration itself will be discussed.
In this talk, we go over the history and future of Apache Flink adoption at Shopify.
We’ll talk about how and why we went from choosing Apache Flink as the replacement for our existing streaming technologies in 2021, to a year later with a flourishing streaming community.
In this talk we'll discuss mechanisms you can use to balance your data, such as keys, composite message key, role of hashing, custom partitions and other things you need to keep in mind when splitting data across partitions.
In this live-coding lightning talk, we'll start from scratch and build a streaming graph data pipeline from start to finish. With our data in Kafka, Quine plugs in and requires just a graph query written in the Cypher graph query language.
We will share how we: -- drive the data streaming readiness by standardizing Kafka clusters among divergent payment application demands. -- overcome the challenge of designing and implementing Kafka enterprise infrastructure to meet business requirements
This talk discusses how we built libraries, templated micro services and tooling that leverages Postgres and Kafka for safely dealing with dead letters, inspecting and querying them, and republishing them to retry kafka topics for safe reprocessing at a later time.
In this talk, we will discuss how we overcame these challenges and delivered a fully automated and robust data exchange solution by extending Kafka Connect, leveraging ksqlDB streams/tables and aggregations, and developing custom microservices.
In this talk, we'll discuss the actual implementation details for the clients and topics that live in multi-cluster environments, including: What naming conventions and patterns should be followed for topics in a multi-cluster architecture? How does this differ between application?
In this talk, we will go over how Ducktape solves the problem of multi-service distributed testing, what type of testing it is designed for, and how it simplifies the testing experience for complex real time systems. Get ready to get your hands dirty and learn how to write a test and a service.
In this talk, we will learn how we leveraged Kafka and Druid to provide real-time aggregations of spend against both daily and lifetime budgets. This led to significant decreases in overdelivery compared to the previous batch system, and savings of $LARGE_NUMBER_OF_DOLLARS
In this talk, we will cover the changes to the threading model that made more dynamic error handling possible. We will also introduce the Streams handler, which unlocked options to react immediately in cases that would previously cause cascading thread death.
This talk will discuss practical tips for architecting and productionalizing scalable and latent data applications that leverage the PubSub model. Attendees will learn about common data messaging capabilities found through the PubSub model and how to leverage PubSub to optimize the performance
We’ll discuss streaming ingestion into Snowflake with Snowpipe Streaming and how we utilized it with the Snowflake Sink Connector for Kafka. We will talk about the improvements and then jump onto a demo which uses Docker containers to spin up a Kafka and Kafka connect environment to load data
We found that by using the ""agent"" concept in faust we could provide our engineers with a ""Function as a Service""-like experience specifically for processing events on Kafka streams.
In this talk we will discuss scaling and planning a system to meet the streaming demands of the world’s only exascale and most energy efficient supercomputer. Tune in to learn more about HPC and how streaming fits in to monitoring large-scale systems.
In this session we will look at how to leverage the Python libraries River and Bytewax to build streaming applications on Kafka that use online machine learning techniques.
This talk provides a work-in-progress update of deploying Kafka on aarch64 Linux. Although the new Apple M1 is ARMv8 based, it has a distinct flavor, or ELF format - arm64. Since much of Kafka consists of noarch rpms, or simply, a bag-o-jars, both Linux and macOS have native implementations of Java
By centralizing the logs that occur in actual game specially MMORPG game, and by detecting and operation anomalies through about more than 300 patterns through KsqlDB, and sharing the know-how gained with game operation
How DOD can manage the military battlefield assets to include integrate signals from a diverse and dynamic set of sensors, including static ground sensors and soldiers worn sensors to provide predictive and operational analytics?
In this presentation, we are going to deep dive into the internals of Kafka log mechanisms. We will look in detail at the structure of the commit-log and segments, topic partitions arrangement on disk, log retention for compact and delete policies.
In this talk, we aim to highlight the importance of integration testing, a critical verification method for stable and reliable large-scale distributed streaming applications. We will also provide a high level overview of our system, challenges faced in moving to a streaming infrastructure
When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job? This session explores the DOs and DONTs.