Watch this six-part series of online talks presented by Kafka experts. You will learn the key considerations in building a scalable platform for real-time stream data processing, with Apache Kafka at its core.
This series is targeted to those who want to understand all the foundational concepts behind Apache Kafka, streaming data, and real-time processing on streams. The sequence begins with an introduction to Kafka, the popular streaming engine used by many large scale data environments, and continues all the way through to key production planning, architectural and operational methods to consider.
Whether you’re just getting started or have already built stream processing applications for critical business functions, you will find actionable tips and deep insights that will help your enterprise further derive important business value from your data systems.
Register for the series to receive an email with links to all the recordings.
Jay Kreps, Confluent CEO and Co-founder, Apache Kafka Co-creator
Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of continuously changing data in real time? The answer is stream processing, and one system that has become a core hub for streaming data is Apache Kafka.
This presentation will give a brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will explain how Kafka serves as a foundation for both streaming data pipelines and applications that consume and process real-time data streams. It will introduce some of the newer components of Kafka that help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Jun Rao, Confluent Co-founder, Apache Kafka Co-creator
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. This talk will provide a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
David Tucker, Director, Partner Engineering and Alliances
A stream processing platform is not an island unto itself; it must be connected to all of your existing data systems, applications, and sources. In this talk we will provide different options for integrating systems and applications with Apache Kafka, with a focus on the Kafka Connect framework and the ecosystem of Kafka connectors. We will discuss the intended use cases for Kafka Connect and share our experience and best practices for building large-scale data pipelines using Apache Kafka.
Neha Narkhede, Confluent CTO and Co-Founder, Apache Kafka Co-creator
At its core, stream processing is simple: read data in, process it, and maybe emit some data out. The core features needed to do stream processing include scalability and parallelism through data partitioning, fault tolerance and event processing order guarantees, support for stateful stream processing, and handy stream processing primitives such as windowing.
This talk will introduce Kafka Streams and help you understand how to map practical data problems to stream processing and how to write applications that process streams of data at scale using Kafka Streams. We will also cover what is stream processing, why one should care about stream processing, where Apache Kafka and Kafka Streams fit in, the hard parts of stream processing, and how Kafka Streams solves those problems; along with a concrete example of how these ideas tie together in Kafka Streams and in the big picture of your data center.
Michael Noll, Product Manager
Why are there so many stream processing frameworks that each define their own terminology? Are the components of each comparable? Why do you need to know about spouts or DStreams just to process a simple sequence of records? Depending on your application’s requirements, you may not need a full framework at all.
Processing and understanding your data to create business value is the ultimate goal of a stream data platform. In this talk we will survey the stream processing landscape, the dimensions along which to evaluate stream processing technologies, and how they integrate with Apache Kafka. Particularly, we will learn how Kafka Streams, the built-in stream processing engine of Apache Kafka, compares to other stream processing systems that require a separate processing infrastructure.
Roger Hoover, Engineer, Confluent
The previous talks in this series cover components of the Kafka ecosystem and stream processing in general. This talk will focus on how to integrate all these components into an enterprise environment and what things you need to consider as you move into production.
We will touch on the following topics:
- Patterns for integrating with existing data systems and applications
- Metadata management at enterprise scale
- Tradeoffs in performance, cost, availability and fault tolerance
- Choosing which cross-datacenter replication patterns fit with your application
- Considerations for operating Kafka-based data pipelines in production