View sessions and slides from Current 2023
In this keynote, leaders from the Kafka and Flink communities will highlight recent contributions, upcoming project improvements, and the innovative applications that users are building to shape the future of data streaming.
The Keynote will present how a Data Streaming Platform built on Kafka can enable organizations to stitch together data from across the organization to produce high-value data assets that can be shared and used to support operational applications and data analytics, including exciting new use cases.
In this talk, we'll explore this dynamic synergy between Kafka and ClickHouse with a live demonstration leveraging OpenSky data. We’ll use ClickPipes, the ClickHouse Cloud native Kafka integration, for building an end-to-end real-time data processing and analytics solution.
We will talk about avoiding data-loss with Flink’s Kafka exactly-once producer, configuring Flink for getting the most bang for the buck out of your memory configuration and tuning for efficient checkpointing.
During this session, we explore selected event-driven architecture patterns commonly found in the field: the claim-check pattern, the content enricher pattern, the message translator pattern, and the outbox pattern.
In this talk, I will share some simple optimization techniques you can apply with streaming SQL in just a few minutes that can cut costs by 10x or even 100x. Then, we’ll gradually dive deeper into some novel optimization techniques that can be applied across your distributed storage.
In this presentation, I will talk about how we solved the problem of cataloging and discovery using Datahub as our discovery platform. I will cover the details of how we went about ingesting metadata from a plethora of infrastructure and platform components.
In this talk, attendees will walk away with the current challenges of building a medallion architecture at low-latency how the record index and incremental updates work with Apache Hudi, how the new Hudi CDC feature unlocks incremental processing on the lake.
In this session, you will gain an understanding of the importance of end-to-end traceability, and several tools & examples for improving observability in your own distributed event driven applications.
Join this session to learn about Apache Druid and why companies use it in combination with Kafka and Flink for real-time applications. Learn how Apache Druid complements Flink and Kafka - and what makes it purpose-built for analyzing streams and events.
Cloudera integrates NiFi, Kafka, and Flink into a single platform bringing unparalleled speed and flexibility to your AI pipelines.
In this session, we will discuss some recent developments in Generative AI and how those can be leveraged to build intelligent applications. Learn how to bring the power of large language models (LLMs) to your private, real-time operational data across multiple data types.
Come to this talk to understand the forces that have given rise to this class of database, learn about Pinot's internals, and see some examples of it in action.
In this talk, we will walk through the steps to implement a real time anomaly detection system on time series data using Apache Flink. We will implement and compare several algorithms, Exponentially Weighted Moving Average (EWMA) and Probabilistic EWMA (PEWMA), from an academic paper.
In this talk, we will dive into all aspects of the new protocol, look into the architecture of Apache Kafka's brand new group coordinator, discuss the upgrade path for both the consumers and the brokers, and finally update the community about where we stand in the development.
Join to learn how MQTT and Kafka combine to handle millions of data points with ease, allowing Rimac to deliver a scalable and seamless customer experience. Attendees will glean insights into the architectural strategies that manage vast device networks and high-velocity data.
In this session, we’ll explore the topic of auto-scaling by implementing a strategy for Confluent Cloud resources. We’ll first discuss common use cases that dictate a need to create a scaling strategy for Confluent Cloud and introduce the approaches best suited for each use case.
In this talk we will show you the latest feature-packed Lenses open-source S3 Source & Sink connectors, patterns & best practices for how to backup/restore to S3, how to have a seamless, one-click experience to backup and restore in Lenses 5.3.
By shifting our processes to an Event Driven Architecture, one business line at a time, we hope to set the stage to revisit our critical data query models, while highlighting and correcting the data quality issues that inevitably built up over decades.
In this talk, we will showcase strategies, patterns, and techniques that were developed during an effort to abstract (and strangle) 20+ internal systems into a unified data integration platform.
After the session, you'll understand the importance of the underlaying JVM and how you can leverage this knowledge to boost the performance of the cluster to achieve better SLAs or reduce the infrastructure costs.
In this talk we show how data and machine learning teams can rapidly prototype and deploy real-time ML apps, ingesting real-time data with the help of Apache Kafka® and Airy, an open-source app framework. We will discuss different options to finetune LLMs and „chaining“ them with other ML models.
Jeremy will showcase a live demo, illustrating the tangible advantages: fewer managed data systems, reduced latency, simpler coding, and interactive UIs. Whether a Kafka novice or a seasoned user, walk away from this session with actionable takeaways and tools for your business.
In this talk we will cover the architecture of our Kafka Streams layer that makes it possible to use external data feeds as rule input, how we handle dynamic criteria for joins and filters, best practices for writing dynamic rule engines in Kafka Streams and upcoming improvements to Kafka Streams.
In this session, we will talk about the roles of data engineers, why data engineering is critical to the success of data organizations, and how to build a winning data engineering culture that empowers both data engineers and partners.
In this talk, We’ll be sharing several experiences in setting up a COE for large industrial companies, insurance and logistic environments. From setting up a strong foundation, defining event designs, best practices, and principles to the guidance of development teams.
This presentation will look at the best practices to configure Kafka Connect to output important data when evaluating if more resources are needed for a Kafka Connect worker or if a new node should be added to the cluster overall.
In this session, we'll share our experiences and lessons learned working with both technologies to ingest data from Kafka into our Iceberg datalake at near-real-time speeds.
In this session you'll learn how this innovative scheme of interleaving snapshot queries and log-based change events works under the hood and how it solves common tasks when running CDC pipelines. We'll also discuss advanced topics like parallelising snapshots and customising snapshot contents.
In this talk, we'll explore just how ""declarative"" we can make streaming data pipelines on Kubernetes. I'll show how we can go deeper by adding more and more operators to the stack. How deep can we go?
In this talk, we will discuss how we have tackled this problem head-on with a fully automated degraded storage detection and remediation system. We’ll highlight the importance of monitoring storage performance and take a deep-dive into how we formulated the detection algorithm.
At Goldsky, we needed a way to configure CDC for a large Postgres database dynamically: the list of tables to ingest is driven by customer-facing features and is constantly changing.
We will discuss how we at Pinterest transformed real time user engagement event consumption.
Rakesh discusses how Lyft organically evolved and scaled the streaming platform that provides a consistent view of the marketplace to aid an individual team independently run their optimization.
In this session, you’ll see how the Flink and Kafka communities are uniting to tackle these long-standing technical debts. We’ll introduce the basics of how Flink achieves EOS with external systems and explore the common hurdles that are encountered when implementing distributed transactions.
In this session, we’ll explore transforming signals from the time domain to the frequency domain using FFT, maximizing the level of compression of input signals while building a precise frequency alert system.
In this session, we will explore the challenges that arise when building a modern streaming SQL engine like Flink SQL.
In this session, we will discuss our work in this regard using machine learning. We will discuss popular lag patterns and how our ensemble forecasting system learns from the past and predicts future trends. We will also showcase some case studies and benefits of having such a system.
In this session, Mike Rosam and Tun Shwe will share their experiences of building data teams at McLaren and fast growth startups. They will take you on a journey of how they navigated their way from the old batch world to the new streaming world.
This session explores how Apache Flink can narrow the gap between batch and streaming by keeping the same data pipelines definition while the underlying technology evolves.
In this session, we’ll walk through a real-world example of capturing changes made in a relational database with a connector configured to use Apicurio Registry, publishing those changes to Kafka serialized in Avro’s compact binary form, and utilizing Kafka Streams, Quarkus, and Camel-K.
In this talk, we’ll explore the problems you’ll experience from your Kafka infrastructure expanding and many clever solutions to mitigate them.
In this talk, you will learn to build a Streamlit data application to help visualize the ROI of different advertising spends of an example organization.
In this talk, we will discuss architectural choices, challenges, and lessons learned in adapting Kafka for open science and open data. Our novel approach to OpenID Connect / OAuth2 in Kafka is designed to securely scale Kafka from access inside a single organization to access by the general public.
In this talk, we will explore how Confluent and Amazon Web Services (AWS) work together to help you in the journey of data modernization and innovation.
By the end of this talk, attendees will have a solid understanding of Flink connectors, the connector interface, and be better equipped to build efficient and reliable data processing pipelines with Flink.
This presentation will discuss the importance of optimizing and choosing storage engines for Kafka streams applications.
In this talk we'll discuss what bottlenecks we have hit as we scaled out, and what measures we took to remove them, such as replicating data based on Kafka Headers, connecting to many source and destination Kafka clusters, managing the replication of Kafka topics of varying traffic.
This session is targeted towards developers interested in learning how to use Kafka as the data plane for their MQTT broker infrastructure, without needing to run separate MQTT brokers.
In this talk, Schabowsky will introduce the causes and symptoms of Kafkatosis, such as a lack of stream reuse, inefficient application onboarding, and operational disruptions. He will help you understand the nature and impact of these problems through real-world examples.
In this talk, we describe how the Nucleus engineering team built a real time, user-facing analytics app.
If you’re in discussions surrounding event driven systems at your organization then this talk is for you. Join Ronak and me for this talk and let’s have a discussion.
Event streaming platforms like Kafka have traditionally leaned on ZooKeeper as the cornerstone for coordination and metadata management. This presentation introduces Oxia, a compelling alternative solution.
In this talk, we will present a real-time analytics architecture we implemented in the Rockset database, based on RocksDB, that effectively isolates streaming data ingestion from query serving.
We will talk about the the principles followed in building the feature, the journey of deploying and running it in our production clusters with different workloads, the learnings from running it in production at a large scale, that led to a few interesting features extended from KIP-405.
This talk will go into connecting Apache Kafka and InfluxDB and the why, how, and what you can accomplish by doing so.
In this talk I describe partition multihoming (PMH), a form of virtual partitioning where two or more physical Kafka partitions are guaranteed to be consumed by the same consumer instance.
By the end of our talk, attendees will leave with an understanding of the latest RTML techniques and the essential factors to consider when designing and implementing real-time machine-learning solutions.
If you've ever provided a customer with an analytical report that differed from their operational conclusions, then this talk is for you.
This talk will present the OpenMessaging Benchmark, why it was created, and how one can use it to model messaging workloads and verify the behavior of different systems.
In this talk, we'll present how Pluralsight is replacing streaming systems with an operational data warehouse for their real-time use cases. Today, Materialize powers Pluralsights Plan Analytics and core data models for their content offerings.
In this session you will learn how AWS streaming data services can power your streaming applications that can scale to virtually unlimited storage with Tiered Storage. Discover how AWS services collaborate to address diverse EDAs, CDC applications and real-time analytics use cases.
By attending this talk, attendees will be able to take our learnings from making Confluent Cloud latencies 10x better and possibly apply similar principles to their cloud native data streaming systems.
Join me to learn how IBM Event Automation, a composable solution, puts your events to work by enabling both business and IT users to detect scenarios, act in real time, and automate decisions.
In this talk, we will introduce the audience to the world of querying streaming data on Apache Kafka with SQL, compare and contrast the features and capabilities of each of these tools, and provide an in-depth analysis of their respective Pros and Cons.
In this talk, we'll run through our Kafka infrastructure at Shopify and how clients connect to it. Next, we'll describe our solution for performing failovers using DNS. Afterwards, we'll look at some real world scenarios where this system saved us from major outages.
In this session, we share ideas from a novel system we are developing, called 'Restate'. Our work is inspired by event-sourcing and stream processing systems, but rethought from the ground up for microservices.
If your organization is looking to centralize Kafka consumption logic to a singular client library (instead of multiple different client libraries), please attend this talk to see how Robinhood does it so that the infrastructure team can focus development on a singular library.
We implemented a workflow rule engine that allows users to define rules and conditions to specify the applicable workflows for assets, based on their types, metadata and states.
In this session, we will dive into Kafka’s network, storage, compute costs, and more to show you how to calculate and anticipate the bill for your Kafka deployment. And to take it a step further, we’ll explore what Confluent has done to reduce Kafka cloud costs for ourselves and our customers.
In this talk, we'll share how we tackled the challenge of building a fully managed robust data pipeline using a combination of streaming analytics, batch processing, data lake, and machine learning.
In this talk, we first present the state of the art in Role-Based Access Control for streaming data in Apache Kafka. We then present a novel approach where we bring the same RBAC concepts from relational systems to the data in motion space and compare it with the current solutions.
Join me to understand what side effects are and why they matter. You’ll learn to spot them in your own code. You’ll see how they sneak into our tests, our APIs and our systems’ designs and make everything harder. Then we’ll look at our industry’s many solutions to side effects.
This session kicks off with a technical, no-nonsense introduction to the lakehouse concept, dives deep into the lakehouse architecture and recaps how a data lakehouse is built from the ground up with streaming as a first-class citizen.
We will cover topics such as Flink application lifecycle management, Flink SQL development, multi-tenancy, security, cost optimization, business continuity, customer support, deployment options, and how they are supported in our product Ververica Platform.
In this session, we’ll discuss how we see streaming at dbt Labs. We will dive into how we are extending dbt to support low-latency scenarios and the recent additions we have made to make batch and streaming allies in a DAG rather than archenemies.
By the end of the panel, you’ll be able to make a more informed decision choosing a streaming technology for your next project!
Learn how Striim architected and manages a unified data streaming platform optimized for fast deployment of event stream delivery and processing pipelines that deliver real-time analytics and AI for business use cases.
Chris will explore how organizations can leverage real-time insights to reduce waste, conserve resources, lower carbon footprints, and reduce operational expenses. He will also discuss the ethical considerations around collecting and using environmental data.
Come join our interactive session as we trip the light fantastic in this colorful eye-opening journey into the event streaming dream.
In this talk, we will explore the internal architecture of Kafka Streams to set you up for successfully running and tuning your applications. -- What does the internal threading model look like? How are partitions assigned and mapped to tasks? Why are there multiple internal consumers?
In this session, we'll have a gentle introduction to Apache Kafka, and then a survey of some of the more popular components in the Kafka ecosystem. We'll look at the Kafka Producer and Consumer libraries, Kafka Connect, Kafka Streams, the Confluent Schema Registry, and more.
In this talk, we will share our experiences to explain why state-of-art systems offer poor abstractions to tackle such workloads and why they suffer from poor cost-performance tradeoffs and significant complexity.
This talk will give a refresher on transactions and idempotency and chronicle the various KIPs that improved the protocol over the years. We will also discuss the problem of hanging transactions and how KIP-890 hopes to solve it as well as strengthen the transactional protocol overall.
In this session, we will describe an architecture that addresses simplicity and performance in stream processing deployments, while also reducing cost. This architecture aims for fewer moving parts, fewer clusters and servers to manage, fewer network hops, higher throughput, and lower latency.
Let's uncover what you should be monitoring, why you should be monitoring it, and leave you with properly monitored Kafka Streams applications.
Traditional data pipelines face scalability and cost challenges due to monolithic design and batch processing. They assume all data must be stored in one location, leading to time-consuming, expensive, and error-prone processes.
During this talk, we'll bring these principles to life with real-world examples and demos.
In this talk, we look at textbook examples of using kafka at scale. Specifically focused on Evernorth Health Service's journey of implementing microservices data pipelines, we provide an overview of the patterns we used while implementing CDC data pipelines for these microservices using kafka.
In this talk, I will cover building an Interactive Query service including routing queries between app instances, creating custom queries using Interactive Query v2, testing your IQ Application.
This talk presents Venice and deep dives into how we designed it to enable high data ingestion volumes via Kafka, merging it all coherently from many data sources and many geographically distributed regions. We’ll cover how Venice’s conflict resolution strategy can be a powerful abstraction.
In this talk we’ll cover an architectural overview of BYOC, behind the scenes of BYOC and deployment strategies and a demo of Redpanda BYOC.
Attendees will learn the benefits of serverless and see how it fits into the context of stream processing. We’ll then kick off a demo where we’ll focus on a real world production use case that uses Flink jobs to power an application with extremely low latency.
In this talk we will learn about the tradeoffs between the two technologies and how to implement various use cases in each architecture, including those that need a little more work.
We’ll celebrate the speakers (and attendees) who helped make this conference possible, highlight some of the best sessions from the past two days, and hand out the prestigious Data Streaming awards.…but that’s not all.
We’ll introduce the LSP, how it enables simple development of cross-cutting IDE features, and how we’ve adapted the LSP to handle Connector Configurations.
In this talk, I will demonstrate that it is impossible to accurately answer this question. Approaches such as Flink's batch mode are not able to accurately handle late events and indicate when in processing time a window closed. In fact, they practically guarantee feature leakage.
If you’ve ever wondered how to integrate dbt into your CI/CD processes (think automatic project linting, spinning up testing environments, parsing the manifest file), this session is for you!
This session offers lessons learned from contributing to open source projects, highlighting ways to engage regardless of technical expertise or engineering background.
This talk discusses how we removed SPoF through investments in Kafka infrastructure and our client libraries, letting us support a multitude of requirements of various systems inside robinhood.
In this talk, we will briefly explore Fluvii via its feature set and some simple examples so that you can confidently get started with it!
In this session, we shall cover kafka-latency-analyzer, a script that can use configurable timestamps to produce quantile reports of latency (focused on average and tail) at a topic level and optionally relay such results downstream to another kafka topic for analysis
In this session, we show you how kash.py can be used to bring the two disparate worlds of files and streaming together - and thus not only save a lot of time and money hiring real-time and streaming experts, but also make your data scientists, like ours, start loving real-time.
This talk will encourage movement while we learn to help bolster cognitive retention and offer a break from normal passive listening. So come learn from two Confluent CSTAs in the industry and make kstreams upgrade strategies muscle memory.
This talk will explore how Confluent and Imply can be used to visualize streaming data in real-time. We will discuss visualization tools and options including line charts, bar charts, heat maps, geospatial, and scatter plots, and how they can be used to help humans understand data.
This session wants to show how we successfully measured and evolved our Kafkas configuration, with the goal of giving the best possible user experience (and resilience to their data).
Move it, share it, bridge it, stage it, backup it, optimize it, bop it. Did you know you can do these things with geo-replication in Kafka? (well, except bop it)