Best Practices for Apache Kafka® in Production: Confluent Online Talk Series

[Webinar] From Fire Drills to Zero-Loss Resilience | Register Now

Register for the series

When you use Apache Kafka to run your most critical applications, you want to get it right the first time.

In this online talk series we’ll share war stories, lessons learned, and best practices for running Kafka in production. Starting with recommendations for deployment architecture, we’ll share what you need to know when designing reliable streaming applications or spanning multiple data centers. We’ll also cover planning for critical failures and, most importantly, how to monitor your clusters and applications to ensure their performance and reliability.

Join Gwen Shapira for a 5-part series where she will lead you through all the best practices for deploying Apache Kafka in production environments.

Speaker:

Gwen is a product manager at Confluent managing Confluent Platform, a stream data platform powered by Apache Kafka. She is an active Apache Kafka Committer and developer. Gwen has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies.

Available on-demand:

Part 1: Deploying Confluent Platform in Production

Choosing the right deployment model is critical to successfully running a scalable streaming platform in production. Selecting the right hardware or cloud deployment architecture for each use case is important to ensure that the system reliably provides high-throughput and low-latency data streams.

In this talk, Gwen will describe the reference architecture of Confluent Enterprise, which is the most complete platform to build enterprise-scale streaming pipelines using Apache Kafka. This talk is intended for data architects and system administrators planning to deploy Apache Kafka in production.

Part 2: Reliability Guarantees in Apache Kafka

In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka for their critical applications. Kafka provides the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked “Can you guarantee that we will always get every transaction,” you want to be able to say “Yes” with total confidence.

In this session, we will go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost – if you are not careful. You will learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka. And if you need proof that you built a reliable system – we’ll show you how you can build the system to prove this too.

Part 3: Common Patterns of Multi Data-Center Architectures with Apache Kafka

Whether you know you want to run Apache Kafka in multiple data centers and need practical advice or you are wondering why some organizations even need more than one cluster, this online talk is for you.

In this short session, we’ll discuss the basic patterns of multi-datacenter Kafka architectures, explore some of the use-cases enabled by each architecture and show how Confluent Enterprise products make these patterns easy to implement.

Part 4: Disaster Recovery Plans for Apache Kafka

Running Apache Kafka in production is only the first step in the Kafka operations journey. Professional Kafka users are ready to handle all possible disasters - because for most businesses having a disaster recovery plan is not optional.

In this session, we’ll discuss disaster scenarios that can take down entire Kafka clusters and share advice on how to plan, prepare and handle these events. This is a technical session full of best practices - we want to make sure you are ready to handle the worst mayhem that nature and auditors can cause.

Part 5: Metrics are Not Enough: Monitoring Apache Kafka and Streaming Applications

When you are running systems in production, clearly you want to make sure they are up and running at all times. But in a distributed system such as Apache Kafka… what does “up and running” even mean?

Experienced Apache Kafka users know what is important to monitor, which alerts are critical and how to respond to them. They don’t just collect metrics - they go the extra mile and use additional tools to validate availability and performance on both the Kafka cluster and their entire data pipelines.

In this presentation we’ll discuss best practices of monitoring Apache Kafka. We’ll look at which metrics are critical to alert on, which are useful in troubleshooting and what may actually be misleading. We’ll review a few “worst practices” - common mistakes that you should avoid. We’ll then look at what metrics don’t tell you - and how to cover those essential gaps.