Apache Kafka® and Confluent Platform are commonly used to put mission-critical business data in motion to gain timely critical insights and make proactive business decisions.
For a mission-critical Kafka deployment, it's imperative to have security and governance controls in place. These controls require infrastructure and metadata. A good architecture segregates the infrastructure that supports the business application versus the infrastructure that supports security and governance controls.
Confluent Cloud is architected this way in order to provide secure, reliable, and cost-effective Kafka-as-a-Service. This blog post demonstrates how to architect Confluent Platform deployments in this manner.
Accordingly, you want to eliminate as much noise from neighboring applications as possible in order to run your cluster at maximum efficiency. Typically, you end up deploying your cluster on a dedicated environment or a network, and expect that your cluster will service only your client application data and its bandwidth will not be consumed by any noisy neighbors.
Normally, your Kafka or Confluent Platform-based cluster looks like this:
Whereas you want it to look like this
With your cluster hosted in a dedicated environment, the goal is to see if you can further reduce the noise that is generated within your cluster. You would like to minimize any internal messages that are stored on Kafka internal topics like messages for monitoring, security, and cluster administration. Here’s an example of the internal traffic that Control Center exerts:
Control Center state store size ~50 MB/hr
Kafka log size ~500 MB/hr (per broker)
Average CPU load ~7 %
Allocated Java on-heap memory ~580 MB and off-heap ~100 MB
Total allocated memory including page cache ~3.6 GB
Network read utilization ~150 KB/sec
Network write utilization ~170 KB/sec
Likewise, Confluent Metadata Service also creates internal topics, exposes additional listener ports (for JWT token validation and for administration), and makes search queries to LDAP which can return a lot of records.
Can you also eliminate this noise? Yes, but you cannot completely eliminate internal topics used by brokers themselves. With Confluent Platform you can minimize it to a large extent. You can use most of your cluster bandwidth to process only your business data and keep auxiliary data and message exchange to a minimum.
With the advent of service mesh and containerized applications, the idea of the control and data plane has become popular. A part of your application infrastructure, such as a proxy or sidecar, is dedicated to aspects such controlling traffic, access, governance, security, and monitoring and is referred to as the control plane. Another part of your application infrastructure that is used purely for processing your business transactions is referred to as the data plane.
Can you do this for your Kafka cluster? For example:
Yes, of course (as shown above). With very little effort and configuration this setup can be achieved for your Kafka clusters. The complete picture looks like the following:
The primary benefit of this pattern is a clear segregation between clusters that process your business data and a cluster that is responsible for administration and security. You can also have multiple data clusters which can be managed by a single cluster in your control plane, centralizing administration and control. Further, you can free up your data plane cluster's bandwidth for processing your business data.
Setting up a control plane for a single data cluster can be of value if your data cluster is mission critical, like handling financial messages or transactions. Otherwise, the preferred approach is to use a control plane cluster to manage multiple data plane clusters.
The following details each aspect of this setup.
Confluent Metrics Reporter is a component that is responsible for gathering JMX metrics and feeding it to internal topics on your Kafka brokers. You can have the metrics reporter feed these metrics to another/remote Kafka broker instead of the local Kafka broker to free up your data plane Kafka brokers from handling any metrics-related message traffic. The configuration snippets below show how this is done.
A configuration snippet from the Kafka broker configuration file
Confluent Control Center can monitor multiple clusters as described in the documentation.
On your control plane cluster, you need to let Confluent Control Center discover the cluster(s) in your data plane so it can get metadata about your data plane cluster such as the cluster ID as well as the name that you give to your data cluster; in this case we call it the “data-plane.”
A configuration snippet from the Confluent Control Center configuration file
Confluent Control Center is used here for demonstration purposes only. Alternatively, you could use the Prometheus JMX agent to export these metrics to your favorite monitoring platform (e.g., Prometheus/Grafana or Datadog, etc.). The point is that you have moved the metrics traffic away from your data plane Kafka brokers.
The Confluent Metadata Service (MDS) is a component which is responsible for authorization across all client applications as well as Confluent Platform internal components. It is also responsible for authentication for your platform components such as Schema Registry or ksqlDB. However, it needs an identity provider (LDAP-based directory service) to refer to identity records. Now this seems like a lot of work, so why not delegate it to an external cluster (control plane) and free up your (data plane) cluster for solely processing your business data?
Confluent Metadata Service can be used to manage multiple clusters, however, this example shows just one cluster in the data plane, but you can manage multiple clusters using a single metadata server.
There are two sections below that provide an excerpt from the Kafka broker configuration file on the data plane. In the first section you can see how the broker is made to connect to a remote Metadata Service.
In the configuration snippet below there are two things going on: first, a JWT token is obtained by authenticating with the remote (control plane) broker using HTTP basic authentication and second, the token is submitted to the token listener port (9092) to authenticate and get required access to the topic on the remote broker.
In addition to externally controlling access, it is also possible to centralize audit logging to another Confluent Platform server cluster as described here: centralized audit logs. You can also use Confluent Cluster Registry to centrally manage your clusters.
The configuration files for this examples can be found at https://github.com/sanjaygarde/cp-kafka-with-planes, and includes:
Terraform scripts for provisioning nodes for control and data planes
cp-ansible scripts for building the clusters
Configuration files for the control and data planes
As you can see, it's simple to externalize monitoring, authorization, and administration for Confluent Platform-based clusters. However, if you are considering or using Confluent Cloud then you are in luck as this abstraction of control and data planes is inherent in Confluent Cloud.
Each of your Kafka clusters are purely used for processing your business events (messages) only. For your metrics, security (authentication, identity federation/SSO, authorization, audit log, etc.), monitoring, and administration separate Kafka clusters are used.
Moreover, with Confluent Cloud, Schema Registry, Kafka Streams, and ksqlDB are also hosted on separate clusters, leaving your Kafka cluster to process only your business data and assuring the full bandwidth of your cluster for your business.
If you’d like to try Confluent Cloud, new sign-ups receive $400 in free usage! And be sure to use the code
CL60BLOG for an additional $60 of free usage (details).