Imagine having access to metrics, events, and insights without code modification or application redeployment. Imagine visualizing delays and tracking down performance bottlenecks in your Kafka pipeline instantly with minimal performance overhead. In this session, we show all of this is possible with eBPF.
In a live demo, we will introduce an eBPF-based, always-on, CPU profiler to visualize what your Kafka applications are spending time on. We will analyze how much time the Kafka broker spends on handling different requests and responding to polling and how much time a Kafka consumer spends on polling the broker and processing the messages. Furthermore, we will see how to detect issues by measuring consumer lags in both offsets and seconds, and how to correlate the increasing consumer lag with the CPU flame graphs. We demonstrate how not only to detect issues quickly but also to pinpoint performance bottlenecks instantly in the Kafka pipeline: e.g. garbage collection and disk/network IO.
In addition, we will provide some unique insights with eBPF: e.g. topic-centric flow graphs, consumer rebalancing lags, and under-replicated partitions.
Collecting all the data with no instrumentation and low overhead is no easy task. we will conclude by revealing the magic of eBPF and discussing the design choices and technical challenges of our network traffic tracer and Java CPU profiler that empowered deep visibility into Kafka.