Are you running at scale? Did you experience “voodoo problems” in your infrastructure? We have a 5M messages/sec cluster that taught us some valuable lessons. Seeing our Kafka clusters become sluggish or crash, taking our production services with them, we have some insights that we hope help you steer your next production incident and make sure your data pipelines run smoothly. We’ll tell the story of skews and anomalies in CPU and disk metrics - drawing graphs and conclusions. Understand how compacted topics, partitions distribution, and RAM can affect your cluster’s performance. Finally, look at how a small configuration drift can rattle your cluster. Our goal is to provide you with the tools and knowledge to navigate this uncharted territory.
Presenter
Or Arnon
ironSourceBorn and raised in Tel Aviv, the startup hub of Israel and one of the best cities in the world.
From serving in one of the IDF's elite intelligence units, I worked as a system administrator, a solution architect, and a DevOps engineer.
As a DevOps engineer, I thrive on problem-solving via collaboration, tools, and build for scale. I aspire to make good things, great.
As a team lead, I focus on the growth of the business and of our teams. I aim to build a team that is challenged, agile, and takes pride in their work.
Most importantly, we're a people-centric company. Fun and fulfillment are part of our culture.
My current role, as a DevOps team lead at ironSource, has been my most challenging and fulfilling one.
Presenter
Elad Eldor
ironSourceElad Eldor is a data platform team leader at the mobile division of ironSource, working mainly with Druid, Kafka, Presto and Spark on AWS. He has 12 years of experience as a java software engineer and 5 years as an SRE in big data linux-based clusters.
Prior to ironSource, Elad was an SRE at Verint (currently Cognyte), where he developed big data applications (using spark, hadoop and kafka) and handled the reliability and scalability of spark and kafka clusters in production. His main interests are JVM tuning, performance tuning and cost reduction of big data clusters (Kafka, Druid, Spark, Presto)