Project Metamorphosis: Unveiling the next-gen event streaming platform.Learn More
Confluent Platform

Monitoring Confluent Platform with Datadog

Earning customer love is a core value at Confluent, and like all relationships, listening makes the love flourish. When it comes to monitoring, we’ve heard you, and we are pleased to announce a new agent-based integration with Datadog for monitoring Confluent Platform. With this new integration, our mutual customers will be able to monitor Confluent Platform alongside the rest of their IT infrastructure components and leverage the flexible capabilities of Datadog’s monitoring solution.

Confluent Control Center continues to be a great option for the fully integrated view of Confluent Platform when you are looking to monitor and manage in a single UI experience. It provides a highly opinionated view of metrics for helping make administrative decisions, and we will continue our investment in this area. However, we also know that our customers need to have an integrated monitoring view across their entire IT stack to feel confident in production, so we’re partnering with Datadog to make that as easy as possible.

How it works

Datadog has had an Apache Kafka® integration for monitoring self-managed broker installations (and associated Apache ZooKeeper™ deployments) with their Datadog Agent for several years. The Confluent Platform integration adds several new capabilities:

  • Monitoring for Kafka Connect, ksqlDB, Confluent Schema Registry, and Confluent REST Proxy
  • Monitoring for Java-based Kafka clients
  • Default Confluent Platform dashboard with the most critical metrics
  • Optionally configured log collection

This integration comes pre-installed starting in Agent v6.19 and v7.19. Users unable to upgrade Agent versions can install the integration on the Agent with a simple command:

sudo -u dd-agent datadog-agent integration install datadog-confluent_platform==1.0.0

Once this integration is installed, start pulling in the JMX metrics from Confluent Platform services by going to the confluent_platform.d directory in the Agent’s conf.d directory, and take a look at the conf.yaml.example file located there. Then modify the instances stanza to add all of the hosts you want to monitor. Restart the Agent, and you will start to see metrics emitted to Datadog for all of the Confluent Platform services that you have configured. The Agent will also start collecting host-level system metrics related to CPU, memory, disk I/O, and network I/O. For more details, see the documentation.

Here’s an example of what the default dashboard that comes with the integration looks like once you have configured the Agent:

Datadog Dashboard

This dashboard gives you a starting point for showcasing the different components. We recommend that you copy this dashboard and then start slicing and dicing the data as you need to for your business.

Control Center 🤝 Datadog: Better together

If you use Datadog as your central monitoring tool, there is complementary value in utilizing Control Center’s management capabilities. Take a look at the Failed Task Count by Worker graph in the dashboard above. How do you know which steps to take next? By using Control Center’s capabilities to drill into Connect issues and fix them in the same UI, you can first see at a high level how many connectors are impacted by the failed tasks:

All Connect Clusters

Then you can get further details on the tasks from the degraded connector to get an idea of which hosts are impacted so you can determine if the problem is with the connector or with a particular set of tasks. In this case, you can see that the connector as a whole has failed, so you know the problem is likely with the configuration of the connector or with a downstream system. You can start debugging the connector configuration directly in the Control Center UI.file-sink9

You can take a similar approach when you see anomalies in the number of ksqlDB queries running. Most clusters will experience a steady rate of querying, but what if you start to see a spike in running queries in the Datadog dashboard? You can use Control Center to take a look at what queries are running to find out if there is a new, unexpected use case that is driving the load.

ksqlDB Queries

If you are using Datadog today, go try the integration! We’re excited to expand our integrations with popular monitoring tools to help our customers get the maximum value across their organization from Confluent Platform. If you have a monitoring tool that you’d like to see integrated in a more seamless way, please send us your feedback in #observability on the Community Slack!

Dustin Cote has spent over four years helping customers wrangle their data and infrastructure using a variety of open source technologies ranging from Apache™ Hadoop® to Kafka. At Confluent, he helps customers stabilize and scale their deployments of Apache Kafka.

Did you like this blog post? Share it now

Subscribe to the Confluent blog

More Articles Like This

Introducing Confluent Platform 5.5

We are pleased to announce the release of Confluent Platform 5.5. With this release, Confluent makes event streaming more broadly accessible to developers of all backgrounds, enhancing three categories of […]

15 Things Every Apache Kafka Engineer Should Know About Confluent Replicator

Single-cluster deployments of Apache Kafka® are rare. Most medium to large deployments employ more than one Kafka cluster, and even the smallest use cases include development, testing, and production clusters. […]

Turning Data at REST into Data in Motion with Kafka Streams

The world is changing fast, and keeping up can be hard. Companies must evolve their IT to stay modern, providing services that are more and more sophisticated to their customers. […]

Sign Up Now

Receive up to $50 USD off your bill each calendar month for the first three months.

New signups only.

By clicking “sign up” above you understand we will process your personal information in accordance with our Privacy Policy.

By clicking "sign up" above you agree to the Terms of Service and to receive occasional marketing emails from Confluent. You also understand that we will process your personal information in accordance with our Privacy Policy.

Free Forever on a Single Kafka Broker

The software will allow unlimited-time usage of commercial features on a single Kafka broker. Upon adding a second broker, a 30-day timer will automatically start on commercial features, which cannot be reset by moving back to one broker.

Select Deployment Type
Manual Deployment
  • tar
  • zip
  • deb
  • rpm
  • docker
Auto Deployment
  • kubernetes
  • ansible

By clicking "download free" above you understand we will process your personal information in accordance with our Privacy Policy.

By clicking "download free" above, you agree to the Confluent License Agreement and to receive occasional marketing emails from Confluent. You also agree that your personal data will be processed in accordance with our Privacy Policy.

This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising, and analytics partners.