Don’t miss out on Current in New Orleans, October 29-30th — save 30% with code PRM-WEB | Register today
Apache Kafka is well known as a low-latency, high-throughput and highly configurable streaming platform. At AWS, we run thousands of Kafka clusters, each cluster with different hardware and software configurations. Managing such a large and diverse Kafka fleet has taught us several operational lessons. We would like to share some of these lessons with you.
We’ll talk about several topics including (a) monitoring Kafka health, (b) optimizing Kafka to address compute, storage and networking bottlenecks, (c) automating detection and mitigation of infrastructure failures related to compute, storage and networking and (d) continuous software patching.