Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now
Confluent for Kubernetes (CFK) has added declarative API support for Cluster Linking, allowing you to connect public and private cloud environments with a declarative API.
Confluent for Kubernetes provides a complete, declarative API-driven experience for deploying and self-managing Confluent Platform as a cloud-native system. Cluster Linking is an easy-to-use, secure, and cost-effective data migration and geo-replication solution to seamlessly and reliably connect applications and data systems across your hybrid architectures. With the release of Confluent Platform 7.0, Confluent announced the general availability of Cluster Linking in Confluent Platform and Confluent Cloud. Using CFK’s declarative API for Cluster Linking, you can now create hybrid cloud environments with the infrastructure as code (IaC) model.
This blog details the technical challenges faced while developing a declarative API for Cluster Linking and the opinionated choices made to deliver the best user experience. This post also shows how simple and clean it is to set up a hybrid cloud environment with CFK’s Cluster Linking API by setting up a cluster link between an Apache Kafka® cluster in Confluent Cloud—a fully managed, cloud-native service for connecting and processing all of your real-time data—and a local Confluent cluster on Kubernetes managed by CFK to migrate data from am on-prem cluster to cloud cluster, demonstrating the movement of data between private and public cloud infrastructure.
CFK’s declarative APIs enable you to leave the infrastructure management to Confluent’s intelligent, software-based automation, thereby freeing you to focus solely on your real-time business applications. These APIs enable you to express the state of your infrastructure and application state as code in the form of a collection of YAML files. You can check the YAML files into a Git repository so that your teams can collaborate on managing these environments. With CI/CD systems, the YAML files can be pulled from Git to deploy updates to the Confluent environments in development, QA, and then production. This entire paradigm is referred to as GitOps.
Confluent for Kubernetes provides a declarative ClusterLink API to configure and manage Cluster Linking. This declarative API allows you to define the desired state of your cluster links by defining a ClusterLink CustomResource (CR) based on the ClusterLink CustomResourceDefinition (CRD). Confluent for Kubernetes then creates the cluster link and ensures that it maintains the desired state defined by the CustomResource.
These are the components that define a cluster link in the ClusterLink declarative API:
There were a few opinionated choices that were made while implementing this API in order to achieve the best user experience. One of the important components of source configuration is authentication and TLS information. TLS certificates can be configured for clusters by either placing them in the destination cluster and adding that path to configuration or by passing the raw certificates as a configuration. The choice was made to pass the certificates directly via secrets or HashiCorp Vault because it eliminates the extra step to ensure the certificates are present in the Kafka cluster and avoids changes to the Kafka CR to mount the certificates into Kafka brokers which would in turn require rolling of the Kafka cluster. Another decision was to accept a clusterID in this API. While the clusterID for the source cluster can be obtained from sourceKafkaCluster.kafkaRestClassRef, we decided to accept clusterID in the API for the use cases where the source cluster would not open up the REST admin endpoints outside their network.
Overall it was an interesting technical challenge to develop a declarative API that provides the best user experience while ensuring it is not too restrictive for Confluent users. Once you have the CustomResource YAML file, all you need to do is to run kubectl apply -f clusterlink.yaml. You can have multiple ClusterLink definitions that link different environments within an organization. All these YAML files can be maintained in a Git repository (GitOps model), allowing your team to collaborate using a readable declarative API.
Confluent for Kubernetes brings a cloud-native experience for data in motion workloads to on-premises environments. It provides simplicity, flexibility, and efficiency without the headaches and burdens of complex, Kafka-related infrastructure operations. Cluster Linking allows you to connect on-prem and cloud environments by securely, reliably, and effortlessly creating a bridge between them. The declarative API for Cluster Linking takes this combination one step further by providing a cloud-native way to configure the bridge between on-prem and cloud.
The following example sets up a hybrid cloud environment by configuring a cluster link between Confluent Platform running on-prem and managed by Confluent for Kubernetes, and another Kafka cluster running on fully managed Confluent Cloud. Data produced into topics in the on-prem cluster will be transparently replicated to the corresponding topic in the cloud cluster.
apiVersion: platform.confluent.io/v1beta1 kind: ClusterLink metadata: name: clusterlink-demo namespace: operator spec: destinationKafkaCluster: kafkaRestClassRef: name: krc-cloud namespace: operator sourceKafkaCluster: authentication: type: plain jaasConfig: secretRef: plainpassjks bootstrapEndpoint: clink.platformops.dev.gcp.devel.cpdev.cloud:9092 kafkaRestClassRef: name: krc-cfk namespace: operator mirrorTopics: - name: demo-topic --- apiVersion: platform.confluent.io/v1beta1 kind: KafkaRestClass metadata: name: krc-cloud namespace: operator spec: kafkaClusterRef: name: kafka kafkaRest: endpoint: https://pkc-cloud.us-west4.gcp.confluent.cloud:443 kafkaClusterID: lkc-cloud authentication: type: basic basic: secretRef: restclass-ccloud tls: secretRef: ccloud-tls-certs
You can create a cluster link in the Kafka cluster running on the Confluent Cloud with the CR above. Refer to Create a Kafka Cluster in Confluent Cloud and Manage Topics documentation for instructions on how to create a cluster and required topics. This Kafka cluster is configured with SASL-SSL for external listeners, which means in CFK you need to configure the appropriate authentication and TLS certificate information. Any client of this cluster, including a cluster link, must have the right authentication information and certificates.
Next, you have Confluent Platform deployed on-prem using CFK. CustomResource files for the Confluent Platform together with the steps to bring up a cluster using CFK are defined in this GitHub repository. This brings up a Kafka cluster and ZooKeeper in SASL plain mode. Once the on-prem cluster is up and running, you can set up the cluster link between the cloud cluster and on-prem cluster. This will set the on-prem cluster as the source cluster and the cloud cluster as the destination cluster to sync the messages from the mirror topics. Here is a quick demo:
With these simple steps, you have a hybrid environment where data is moved from the on-prem cluster to the cloud cluster in a transparent way and on-prem and cloud resources are managed in a cloud-native way with CFK.
Confluent for Kubernetes was built with the aim of delivering cloud-native Confluent across all your environments. Cluster Linking was built to provide an easy-to-use, secure, and cost-effective data migration and geo-replication solution to seamlessly and reliably connect applications and data systems across your hybrid architectures. With the support of declarative API for Cluster Linking, users have a simple, clean, and cloud-native way to set up a hybrid cloud environment. If you would like to set up a hybrid cloud with CFK, sign up for a free trial of Confluent Cloud and download Confluent Platform and CFK. You can use the promo code CL60BLOG for an additional $60 of free cloud usage.
This blog announces the general availability of Confluent Platform 7.8 and its latest key features: Confluent Platform for Apache Flink® (GA), mTLS Identity for RBAC Authorization, and more.
We covered so much at Current 2024, from the 138 breakout sessions, lightning talks, and meetups on the expo floor to what happened on the main stage. If you heard any snippets or saw quotes from the Day 2 keynote, then you already know what I told the room: We are all data streaming engineers now.