실시간 움직이는 데이터가 가져다 줄 가치, Data in Motion Tour에서 확인하세요!
In our payments platform at Goldman Sachs Transaction Banking, Apache Kafka plays a critical role as the messaging bus in our micro-services architecture. Being a part of the financial service industry we need to ensure high-availability of our platform and quick response time during failures. In this talk we will explore how we monitor and alert on the health of our Kafka clusters using our heartbeat application and clients using DataDog dashboards. We will see how we consolidate JMX metrics such as error-rates, connection-rates, latencies and consumer lag from all producers and consumers using JMX agent sidecar to provide a live view of the health of our entire infrastructure. We will also discuss our culture of game days where we regularly test the resiliency of all the clients in our infrastructure by simulating various failure scenarios to improve the overall availability of our infrastructure.