Nearly 3 years of using Kafka without really understanding the documentation has taught us a lot.
At FINN.no, Norway’s biggest online classifieds site, we have been using Kafka since version 0.8-beta in early 2013. Kafka was introduced as a major part of a proof of concept for collecting click events from our web front end, which had approximately 40 million page views a day. The Kafka cluster became popular, and more and more teams started using it, with our default settings. Still we did not have anyone dedicated to operating and configuring Kafka – it was a best effort, and usually the latest person touching the defaults won. As the traffic, number of use cases, and topics grew, we got some scaling problems, stability problems, lost messages and a general feeling among developers that “”our kafka cluster was unstable””. In the end we had to put some effort into find out what we ran, which properties we really needed, and get control over the configuration, our library code and how our clients used Kafka. Since August we haven’t had a complaint, nor downtime on our Kafka cluster.
We will cover our top 5 mistakes:
-No consideration of data on the inside vs outside
-Lack of schema
-One configuration fits all
-128 partitions for everyone
-Running on already overloaded servers
The Germans say “”Schadenfreude ist die schönste Freude””. At this session we’ll tell you what we learned, so you can avoid doing the same.