The Details That Matter: Kafka in Production, at Scale

« Kafka Summit London 2022

Are you running at scale? Did you experience “voodoo problems” in your infrastructure? We have a 5M messages/sec cluster that taught us some valuable lessons. Seeing our Kafka clusters become sluggish or crash, taking our production services with them, we have some insights that we hope help you steer your next production incident and make sure your data pipelines run smoothly. We’ll tell the story of skews and anomalies in CPU and disk metrics - drawing graphs and conclusions. Understand how compacted topics, partitions distribution, and RAM can affect your cluster’s performance. Finally, look at how a small configuration drift can rattle your cluster. Our goal is to provide you with the tools and knowledge to navigate this uncharted territory.

Presenter

Or Arnon

ironSource

Born and raised in Tel Aviv, the startup hub of Israel and one of the best cities in the world. From serving in one of the IDF's elite intelligence units, I worked as a system administrator, a solution architect, and a DevOps engineer.

As a DevOps engineer, I thrive on problem-solving via collaboration, tools, and build for scale. I aspire to make good things, great.

As a team lead, I focus on the growth of the business and of our teams. I aim to build a team that is challenged, agile, and takes pride in their work.

Most importantly, we're a people-centric company. Fun and fulfillment are part of our culture. My current role, as a DevOps team lead at ironSource, has been my most challenging and fulfilling one.

Presenter

Elad Eldor

ironSource

Elad Eldor is a data platform team leader at the mobile division of ironSource, working mainly with Druid, Kafka, Presto and Spark on AWS. He has 12 years of experience as a java software engineer and 5 years as an SRE in big data linux-based clusters.

Prior to ironSource, Elad was an SRE at Verint (currently Cognyte), where he developed big data applications (using spark, hadoop and kafka) and handled the reliability and scalability of spark and kafka clusters in production. His main interests are JVM tuning, performance tuning and cost reduction of big data clusters (Kafka, Druid, Spark, Presto)

The Details That Matter: Kafka in Production, at Scale

Presenter

Or Arnon

Presenter

Elad Eldor

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how