S'inscrire à la démo | RBAC à grande échelle, Oracle CDC Source Connector et bien plus encore avec le lancement T2 de Confluent Cloud

Bringing Your Own Monitoring (BYOM) with Confluent Cloud

As data flows in and out of your Confluent Cloud clusters, it’s imperative to monitor their behavior. Bring Your Own Monitoring (BYOM) means you can configure an application performance monitoring (APM) product with a Confluent Cloud cluster in order to monitor the telemetry flowing through the cluster. Popular APM products include Prometheus, Grafana, Splunk, and Datadog. Confluent Cloud provides a Metrics API to return the performance data for throughput, latency, and other metrics that inform operators how the cluster is performing.

This blog post uses the Metrics API, Docker, and Prometheus, Grafana, Splunk, and Datadog to put together a full monitoring solution for your Confluent Cloud deployment.

Docker, Prometheus, Grafana, Splunk, and Datadog

Docker Compose is used to coordinate the running of all the local services in
our monitoring stack connected to Confluent Cloud. Each section below walks you through the steps required to set up each service.

First, you need to create credentials for our services to connect to the Metrics API. If you haven’t already, install the Confluent CLI. Please make sure that you have an active cloud cluster running before trying to create any Confluent CLI monitoring credentials. Note that the deprecated ccloud has now been migrated to the Confluent CLI confluent. To spin up a new Confluent Cloud cluster check out the quick start guide on Confluent Developer.

Create a new Confluent Cloud API-key

An API-key with cloud-level privileges has to be used to retrieve the Confluent Cloud metrics from the Metrics API. The cloud-level resource gives the API-key access to all of the resources in a Confluent Cloud cluster.

Once you have the API-key and secret, assign it to a service account and give it the role MetricsViewer. Please refer to the Metrics API documentation to see how to create a service account and assign the MetricsViewer role.

Setting up Prometheus and Grafana for Confluent Cloud

So what are Prometheus and Grafana? Prometheus is a time series database that scrapes metrics from data sources. Grafana is an observability platform that takes telemetry data from different sources and visualizes it in dashboards using graphs and charts.

Prometheus and Grafana work well together because Prometheus can gather data from a service’s open port and make the data available to Grafana to query to display as a visualization.

Create the following docker-compose.yml, adding a service for Prometheus and Grafana. The Prometheus container exposes its port on 9090. The Grafana service exposes its endpoint on port 3000, allowing a user to see the final visualization of the metrics in its UI.

prometheus:
 image: prom/prometheus
 hostname: prometheus
 container_name: prometheus
 ports:
   - 9090:9090
 restart: always
 volumes:
   - ./prometheus/:/etc/prometheus/

grafana:
 image: grafana/grafana:7.4.1
 hostname: grafana
 container_name: grafana
 environment:
   GF_INSTALL_PLUGINS: grafana-piechart-panel
 ports:
   - 3000:3000
 volumes:
   - ./grafana/provisioning/:/etc/grafana/provisioning/
 depends_on:
   - prometheus

Next, create the following prometheus.yml file. In this file, specify the endpoint in the static_config.targets property where Prometheus scrapes the data from the Metrics API. You also can configure the frequency at which data is scraped by using the scrape_interval property.

global:
 scrape_interval:     1m # By default, scrape targets every 15 seconds.
 evaluation_interval: 1m # By default, scrape targets every 15 seconds.

Scrape_configs:
- job_name: Confluent Cloud
    scrape_interval: 1m
    scrape_timeout: 1m
    honor_timestamps: true
    static_configs:
      - targets:
        - api.telemetry.confluent.cloud
    scheme: https
    basic_auth:
      username: 
      password: 
    metrics_path: /v2/metrics/cloud/export
    params:
      "resource.kafka.id":
        - lkc-1234
        - lkc-2345
	"resource.connector.id":
        - lcc-1234
        - lcc-2345
  • Targets: Endpoint for the Metrics API.
  • Basic_auth: Specify the API key and secret.
  • Params: List the resource labels for the Confluent Connect, Apache Kafka®, ksqlDB, and Schema Registry clusters.
    To find the resource IDs for these components, please run the following commands on the Confluent CLI:

    • Kafka: confluent kafka cluster describe
    • Connect: confluent connect list
    • Schema Registry: confluent schema-registry cluster describe
    • ksqlDB: confluent ksql app list
  • Metrics_path: This is the export path of the endpoint that hosts the metrics.

Create a file named dashboard.yml. In order to set up Grafana successfully, you have to configure the predefined Confluent Cloud dashboard to use the Prometheus data source.

apiVersion: 1

providers:
- name: 'Prometheus'
 orgId: 1
 folder: ''
 type: file
 disableDeletion: false
 editable: true
 options:
 path: /etc/grafana/provisioning/dashboards

Once all of these files are set up correctly, you can run the following:

docker-compose up -d

You can see the Grafana dashboard with the data from the Metrics API by opening a browser to http://localhost:3000/ and the Prometheus site at http://localhost:9090/.

Grafana dashboard

 

Metrics pipeline to Grafana

You can get a pre-configured Grafana dashboard for the Metrics API output by visiting the kafka-docker-playground GitHub repository.

Setting up Datadog with Confluent Cloud

Setting up Datadog with Confluent Cloud is now easier than ever. There’s no need to spin up a Docker container any longer. Simply go to your Datadog homepage, click on Integrations in the left panel and then search for “Confluent Cloud.”

Confluent Cloud integration on Datadog

Install the integration. Once it is installed, click on “Add API-Key” and add your MetricsViewer role-based Confluent Cloud API-key and secret. Afterwards, add your Kafka cluster resource ID or any other component resource IDs that are hosted in Confluent Cloud.

MetricsViewer role-based Confluent Cloud API-key and secret

Once you add your cluster to Datadog, wait a few minutes for the metrics to flow in and then create a dashboard.

Below is a screenshot of a sample Confluent Cloud dashboard in Datadog:

Confluent Cloud dashboard in Datadog

The dashboard above categorizes the fully managed connectors into producer and consumer metrics. There are eight panels in total. The top four panels show the number of records being written by the source connectors and the number of records being exported to the data source. The other panels show the throughput of the fully managed Confluent Cloud connectors.

Setting up Splunk with Confluent Cloud

Start by adding the Splunk service to the docker-compose.yml. The SPLUNK_HEC_TOKEN and SPLUNK_PASSWORD values belong to your Splunk account. These are secret values and you should protect the “docker-compose.yml” file appropriately. Then you can specify the port number where you want to access Splunk.

# Splunk Enterprise server:
splunk:
 image: splunk/splunk:latest
 container_name: splunk
 environment:
   - SPLUNK_START_ARGS=--accept-license
   - SPLUNK_HEC_TOKEN=00000000-0000-0000-0000-0000000000000
   - SPLUNK_PASSWORD=changeme
 ports:
   - 18000:8088
 healthcheck:
   test: [ 'CMD', 'curl', '-f', 'http://localhost:8888' ]
   interval: 5s
   timeout: 5s
   retries: 20
 volumes:
   - ./splunk.yml:/tmp/defaults/default.yml
   - /opt/splunk/var
   - /opt/splunk/etc

Next, add another service called “Otelcollector,” short for OpenTelemetry Collector, to the docker-compose.yml. OpenTelemetry offers a vendor-agnostic way to receive, process, and export telemetry data. Splunk offers an OpenTelemetry integration that allows Splunk to get metrics from a data source using OpenTelemetry as the middleman. In the configuration below, the Telemetry Collector depends on the Splunk service.

# OpenTelemetry Collector
otelcollector:
 image: quay.io/signalfx/splunk-otel-collector:0.29.0
 container_name: otelcollector
 command: [ "--config=/etc/otel-collector-config.yml", "--log-level=DEBUG" ]
 volumes:
   - ./otel-collector-config.yml:/etc/otel-collector-config.yml
 depends_on:
   - splunk

Next, create a new file named open-telemetry.yml to specify the OpenTelemetry Collector configuration details.

Please reference the OpenTelemetry Collector documentation to learn how to scrape from Prometheus’s metrics endpoints, which gets the metrics from the Metrics API. The flow in which the metrics travel is listed below:

Metrics pipeline to Splunk

This workflow is based on the Prometheus Federation feature.

The Splunk account’s HTTP event collector (HEC) token must be provided. Below is an example of the open-telemetry.yml file. The OpenTelemetry config file references the Confluent Cloud job that Prometheus scrapes.

receivers:
 prometheus_simple:
   collection_interval: 10s
   # the federation endpoint:
   # Read more about it here: https://prometheus.io/docs/prometheus/latest/federation/
   # You can query the federation with PromQL, encoded as part of the query string.
   endpoint: prometheus:9090
   metrics_path: /federate
   params:
     match[]: '{job="Confluent Cloud"}'

 otlp:
   protocols:
     grpc:
       endpoint: 0.0.0.0:4317

exporters:
 splunk_hec/metrics:
   # Splunk HTTP Event Collector token.
   token: "00000000-0000-0000-0000-0000000000000"
   # URL to a Splunk instance to send data to.
   endpoint: "https://splunk:8088/services/collector"
   # Optional Splunk source: https://docs.splunk.com/Splexicon:Source
   source: "app:metrics"
   # Optional Splunk source type: https://docs.splunk.com/Splexicon:Sourcetype
   sourcetype: "prometheus"
   # Splunk index, optional name of the Splunk index targeted.
   index: "metrics"
   # Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
   max_connections: 20
   # Whether to disable gzip compression over HTTP. Defaults to false.
   disable_compression: false
   # HTTP timeout when sending data. Defaults to 10s.
   timeout: 10s
   # Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
   # For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
   insecure_skip_verify: true

processors:
 batch:

extensions:
 health_check:
   endpoint: 0.0.0.0:13133
 pprof:
   endpoint: :1888
 zpages:
   endpoint: :55679

service:
 extensions: [pprof, zpages, health_check]
 pipelines:
   metrics:
     receivers: [prometheus_simple]
     processors: [batch]
     exporters: [splunk_hec/metrics]  

Last, set up the Splunk configuration file by creating a splunk.yml in the same directory as the other two files. Below is an example of the Splunk config file. The OpenTelemetry Collector will forward the Confluent Cloud metrics to the Splunk endpoint: $SPLUNK_DB.

splunk:
 conf:
   indexes:
     directory: /opt/splunk/etc/apps/search/local
     content:
       metrics:
         coldPath: $SPLUNK_DB/metrics/colddb
         datatype: metric
         homePath: $SPLUNK_DB/metrics/db
         maxTotalDataSizeMB: 512000
         thawedPath: $SPLUNK_DB/metrics/thaweddb

Now, start up Splunk and OpenTelemetry.

docker-compose up -d

Once you verify that the applications are running successfully, visit Splunk at http://localhost:8088.

Splunk dashboard

Conclusion

Confluent Cloud offers Apache Kafka as a fully managed service and allows organizations to monitor the clusters with some of their favorite APMs. As the demand and popularity of Confluent Cloud grows, so will the number of monitoring solutions that can be used to integrate with it. Get started today with a free trial of Confluent Cloud and use the code CL60BLOG for an additional $60 of free usage.*

Get started

Niyi Odumosu is a solutions architect at Confluent who has an in-depth background in software and DevOps engineering. After working as an engineer in the finance, telecom, media, and ad-tech industries, Niyi joined Confluent to help their clients adopt Apache Kafka in their clusters. He has designed solutions to complex problems of mission-critical use cases for an array of Fortune 500 companies. This has given him a diverse exposure to organizational issues with tightly coupled technology systems. Niyi takes pride in helping companies along the data in motion journey that leverages event-based streaming for real-time data movement.

Did you like this blog post? Share it now

Subscribe to the Confluent blog

More Articles Like This

RBAC at Scale, Oracle CDC Source Connector, and More – Q2’22 Confluent Cloud Launch

The Confluent Q2 ‘22 cloud bundle, our latest set of product launches, is live and packed full of new features to help your business innovate quickly with real-time data streaming.

Announcing the Confluent Q1 ‘22 Launch

The Confluent Q1 ‘22 Launch is live and packed full of new features that enable businesses to continue innovating quickly with real-time experiences fueled by data in motion. Our quarterly

Introducing Health+ with Confluent Platform 6.2

For a modern, software-defined business, a platform for data in motion is critical to connecting every part of a vast digital architecture across an organization to harness the flow of