[Demo] Design Event-Driven Microservices for Cloud → Register Now


Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs

In our payments platform at Goldman Sachs Transaction Banking, Apache Kafka plays a critical role as the messaging bus in our micro-services architecture. Being a part of the financial service industry we need to ensure high-availability of our platform and quick response time during failures. In this talk we will explore how we monitor and alert on the health of our Kafka clusters using our heartbeat application and clients using DataDog dashboards. We will see how we consolidate JMX metrics such as error-rates, connection-rates, latencies and consumer lag from all producers and consumers using JMX agent sidecar to provide a live view of the health of our entire infrastructure. We will also discuss our culture of game days where we regularly test the resiliency of all the clients in our infrastructure by simulating various failure scenarios to improve the overall availability of our infrastructure.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how