Don’t miss out on Current in New Orleans, October 29-30th — save 30% with code PRM-WEB | Register today

Cross-Cloud Data Replication Over Private Networks With Confluent

Écrit par

Modern businesses don’t run in just one place. Your applications might live in Amazon Web Services (AWS), your analytics in Microsoft Azure, and critical systems on-premises. The challenge? Keeping all that data connected and flowing in real time—without adding complexity or risk. As more organizations adopt these multicloud strategies, the need for secure, private data replication has become critical.

That’s why Confluent is introducing cross-cloud data replication over private networks, the first and only platform to do so. Now, you can move data securely between AWS, Azure, and Google Cloud even when your clusters are in a private network. This secure cross-cloud replication over the Confluent backbone creates a global private streaming mesh while also including native enterprise-grade protections like IP filtering, mutual TLS (mTLS) authentication, and application-level access controls.

Cross-cloud replication over private networks is powered by Cluster Linking, Confluent’s fully managed, offset-preserving replication service that mirrors topics across clusters. Cluster Linking already makes it simple to connect environments across regions, clouds, and hybrid deployments with near-zero data loss. Now, with private cross-cloud replication, the possibilities expand even further—enabling secure multicloud data sharing, disaster recovery, and compliance use cases that many organizations, particularly those in regulated industries, have struggled to solve for years.

The Need for Cross-Cloud Replication Over Private Networks

For enterprises running across multiple clouds, keeping data secure, connected, and always available is not optional; it’s mission-critical. Yet many organizations face strict barriers regarding security and compliance that prevent them from sharing data over the public internet. Private cross-cloud replication solves both technical challenges and business considerations, unlocking use cases that have been out of reach.

  • Data sharing: Share data seamlessly across environments without re-architecting existing systems, which is especially valuable when companies merge or teams collaborate and their data resides in different clouds. If only a single copy of the data is kept in AWS, Azure-based clients must constantly access it remotely, which can increase latency, drive up egress costs, and create performance bottlenecks. Replicating the data into Azure allows local consumption while keeping access secure and efficient.

  • Disaster recovery and resilience: Maintain standby clusters across cloud providers or regions. If one cloud experiences an outage, you can fail over while maintaining a near-zero recovery time objective (RTO) and recovery point objective (RPO). This allows your application to withstand cloud and regional failures.

  • Aggregated analytics: Consolidate data from multiple clouds into a single environment for unified reporting and advanced analytics.

  • Data residency and compliance: Keep data copies in specific geographic regions to meet regulatory requirements while still making it accessible for global applications. Sometimes a single cloud provider won’t have enough data centers in a given geography. Spanning beyond clouds can help you achieve resiliency while staying in compliance.

How Cluster Linking Powers Data Replication

Our vision for Cluster Linking was to create a simple, fully managed way to reliably replicate data between Kafka clusters anywhere. Today, that vision is a reality, offering a native solution that powers critical use cases from disaster recovery to cloud migration without the operational burden. Cluster Linking’s innate capabilities make it easy to address all of these cross-cloud use cases:

  • Data replication over private networks: Cluster Linking supports different networking setups across clouds, making data sharing straightforward even for teams with complex networking setups. For example, the source cluster could be an Enterprise cluster on AWS using PrivateLink, while the destination cluster could be a Dedicated cluster on Azure with virtual network (VNet) peering. Cluster Linking can securely replicate data across these clusters.

  • Exact data replication: Cluster Linking copies topics along with offsets and consumer group offsets, so failover is as simple as updating client endpoints—achieving near-zero RTO and RPO.

  • Fully managed service: Teams can focus on building applications while Confluent Cloud handles the heavy lifting of infrastructure and replication.

Cluster Linking supports multiple networking patterns—such as private-to-private, private-to-public, and public-to-private—so you can replicate data across virtually any combination of clusters and clouds. For the full list of supported patterns and cluster details, refer to the documentation.

Private Cross-Cloud Cluster Linking in Action

In the past, replicating data across clouds over private networks meant running and managing Confluent Replicator yourself inside your own virtual private cloud (VPC) or VNet. The replicator was typically deployed in the destination cluster’s network, requiring you to set up a VPN connection between that network and the source cloud network that has access to the source cluster.

Beyond the operational overhead, this approach also came with limitations. Certain networking patterns simply weren’t supported. For instance, if your source cluster was running on AWS with VPC peering, you couldn’t connect a replicator running in an Azure VNet because VPC peering doesn’t support transitive routing.

Now that Cluster Linking supports cross-cloud replication over private networks through Confluent’s backbone, the process is both simpler and more reliable—eliminating the constraints that made this replication so challenging before.

Let’s walk through how you can put this into practice by creating a global private mesh using Cluster Linking. We’ll walk through a simple, cost-effective way to replicate and sync Confluent Cloud topics across clouds over private networks.

In this scenario, two different business units (BUs) each operate on different cloud providers. BU A runs its applications on AWS, while BU B runs on Azure. Both need access to the same data in AWS, and each prefers to consume data locally to support fan-out patterns (1:few).

We’ll provision two Enterprise clusters—one on AWS and one on Azure—and use Cluster Linking to securely replicate data across clouds. Since Enterprise clusters are powered by PrivateLink, replication happens securely without traversing the public internet. This setup ensures that BU B can consume data natively within Azure, avoiding cross-cloud reads and thereby reducing both latency and data transfer costs.

While we aren’t performing client failover in this example, the secondary Azure cluster maintains an exact copy of the data. Therefore, this setup also provides a straightforward path for disaster recovery, which is covered later.

Infrastructure Provisioning

First, we need to set up both clusters. (If you already have these clusters in place, you can skip this step.) In our case, we’ll use a GitHub repository containing a Terraform script to deploy both clusters—one on AWS and one on Azure.

To deploy the terraform script:

  1. Clone the repo locally.

  2. Run.

terraform init
terraform apply --auto-approve

The Terraform script provisions the clusters along with the necessary networking constructs (such as VPC/VNet endpoints). Since Enterprise clusters are private, we won’t have direct access to them from our local machines. To address this, the script also creates two bastion hosts, which we’ll use to access the topics user interface (UI).

Produce Data to the Source Cluster

We’ll use Datagen Source Connector on the source (AWS) cluster to produce sample order data. We’ll log in to the machine that has access to the source cluster and then follow these steps:

  1. In the Connectors UI, select Datagen Source.

  2. Click Launch.

Sample data is now flowing to the AWS cluster.

Configure Cluster Linking

With the setup complete, we’re ready to replicate data from the AWS cluster to the Azure cluster. This is a destination-initiated cluster link, so we need to create it on the destination cluster (Azure).

A cluster link establishes a secure connection between source and destination clusters. Once the link is in place, you can configure which topics to mirror. These mirrored topics are read-only and managed entirely by the link. Messages produced to the source topic are mirrored “byte for byte,” preserving partitions and offsets. Consumers can read from mirrored topics just like any other topic.

Use the Confluent CLI to create both the cluster link and the mirrored topics. If you used the Terraform script from earlier, the required commands have already been generated in the ./replication_commands.txt file.

Create a cluster link by running the following command on the destination (Azure) bastion host. This will create a cluster link back to the source (AWS) cluster.

echo auto.create.mirror.topics.enable=true > link-config.properties
echo consumer.offset.sync.enable=true >> link-config.properties
echo auto.create.mirror.topics.filters={"topicFilters": [{"name": "*", "patternType": "LITERAL", "filterType": "INCLUDE"}]} >> link-config.properties

confluent kafka link create cross-cloud-link ^
--cluster <destination_cluster_id> ^
--source-cluster <source_cluster_id> ^
--source-bootstrap-server <source_cluster_bootstrap_server> ^
--source-api-key <source_api_key> ^
--source-api-secret <source_api_secret> ^
--config link-config.properties

In this demo, the <source_api_key> and <source_api_secret> are generated automatically by the Terraform script. For a real production environment, however, you should use dedicated API keys that belong to a service account configured with the principle of least privilege. In practice, that means granting only the permissions required:

  • DESCRIBE and READ on the topics you intend to mirror

  • DESCRIBE on the cluster itself

On another note, although our primary use case in this post is cross-cloud data sharing, the same setup can also serve as a foundation for disaster recovery.

In a disaster recovery scenario, you need seamless client failover. Applications should be able to switch from one cluster to another with minimal disruption. To make this possible, in the command above, two additional configurations were enabled on the cluster link:

  • Auto.create.mirror.topics.enable: With this setting turned on, Cluster Linking automatically detects new topics on the source cluster and creates corresponding mirrored topics on the destination cluster automatically. This eliminates manual topic creation.

  • Consumer.offset.sync.enable: Enabling this setting synchronizes consumer offsets from the source to the destination. This means that when clients fail over from the AWS cluster to the Azure one, they can continue consuming from the offset they left.

Together, these settings allow the cross-cloud data sharing setup to be used in a disaster recovery solution. Topics are mirrored automatically, offsets are kept in sync, and client failover becomes smooth and reliable.

Alternatively, if you need to replicate only a specific topic (rather than all topics), you can simply remove the auto.create.mirror.topics.enable setting. By default, this property is set to false.

In this case, you’ll need to explicitly create mirrored topics yourself. You can do this by running the following command for each topic you want to replicate:

confluent kafka mirror create <topic_name> --link cross-cloud-link --cluster <destination_cluster_id>

That’s it! You have now set up a cluster link between AWS and Azure and mirrored your topics across clouds. From this point forward, any messages written to the source topic in AWS will automatically appear in the mirrored topic on Azure.

There’s just one catch: The topic you mirrored is backed by a schema in Schema Registry. That means even though the messages are replicated into the destination (Azure) cluster, you won’t be able to consume them unless the schema is also available in the destination’s schema registry.

So how do we solve this? Enter Schema Linking.

Cross-Cloud Schema Synchronization

Schema Linking extends the same concept as Cluster Linking but for your schemas. It ensures that schemas registered in the source schema registry in AWS are automatically mirrored in the destination schema registry in Azure. With this in place, consumers in Azure can seamlessly read data from the mirrored topic without schema compatibility issues.

Setting up Schema Linking is super simple.

From your local laptop, set the active environment to your source environment (AWS).

confluent environment use <source_envirmonment_id>

Create a configuration file (config.txt) with the destination schema registry details. If you used the Terraform script we provided, this is automatically created for you in the Terraform directory. It should look like this:

schema.registry.url=<destination_schema_registry_url>
basic.auth.credentials.source=USER_INFO
basic.auth.user.info=<destination_schema_registry_api_key>:<destination_schema_registry_api_secret>
  • <destination_schema_registry_url>: The endpoint for the Schema Registry in the Azure environment

  • <destination_schema_registry_api_key> / <destination_schema_registry_api_secret>: The API credentials for authenticating with the destination Schema Registry

Run the following to start exporting the schema from the source schema registry to the destination schema registry:

confluent schema-registry exporter create cross-cloud-exporter --subjects ":*:" --config ./config.txt

In this example, the --subjects ":*:" option means synchronize all subjects in all contexts from the source to the destination. This is especially useful if you want to ensure that both existing schemas and any future schemas are automatically synced to the destination schema registry.

If you prefer, you can also choose to synchronize only specific subjects. For more details, check out the Schema Linking documentation.

Congratulations! You’ve successfully set up cross-cloud replication—mirroring both your data and associated schemas from AWS to Azure. Your consumers in Azure can now read from mirrored topics with full schema compatibility.

Verify Replication

Now that cross-cloud data replication is successfully set up, you can view the replicated messages in the topics UI of the destination cluster. From the destination bastion host, navigate to the destination cluster’s topics UI to inspect the mirrored topic and confirm that data is flowing correctly.

You’ll notice that this is a mirrored topic using the cross-cloud-link we created earlier. Thanks to Schema Linking, we’re able to view the actual messages in a readable format.

How Does Failover Work?

In our use case, we don’t require failover since the goal is simply to share data with other teams on Azure. That said, the destination cluster contains an exact replica of the source data, which means it could be used for disaster recovery scenarios. In such cases, AWS clients could fail over to the Azure cluster, enabling applications to remain resilient even in the event of a complete cloud provider outage.

With consumer.offset.sync.enable=true, consumer offsets are synced to the destination cluster, making failover seamless. One consideration is that offset sync is asynchronous, so in the event of a sudden outage, the most recent consumer offsets may not have been committed to the destination cluster yet. Therefore, a few messages may be reprocessed. Applications should be designed to handle this small amount of duplication.

Similarly, because Cluster Linking itself is asynchronous, there’s a small chance that some messages written to the source cluster haven’t yet been mirrored to the destination at the time of sudden disaster. Both producer and consumer applications must be tolerant of this.

Failover to the destination cluster is straightforward. Simply switch off producers and consumers that are running on AWS and then promote the mirrored topics by running failover command.

confluent kafka mirror failover <topic_name> ^
  --link cross-cloud-link ^
  --cluster <destination_cluster_id>

After the command is successful and the mirrored topics are promoted, start the clients on Azure and point to the new bootstrap server.

Get Started With Cross-Cloud Cluster Linking

Cluster Linking makes private cross-cloud data replication straightforward: It’s fully managed, preserves offsets, and works seamlessly with private networking across clouds and on-premises. When paired with Schema Linking, consumers can immediately read mirrored topics with full schema support.

In this post, we showed how to replicate data from a private AWS cluster to a private Azure cluster. This setup not only enables seamless cross-cloud data sharing with local reads (reducing egress costs and latency) but can also serve as a disaster recovery solution. The same configuration supports client failover between clusters, helping you achieve low RPO and RTO while keeping applications resilient to cloud outages.

It’s truly never been easier to replicate data across clouds with Confluent Cloud, and we invite you to explore this capability for yourself. To get started, check out our documentation and head to Confluent Cloud to start linking your private clusters today. To try out the demo, head to the GitHub repository.

If you haven’t done so already, sign up for a free trial of Confluent Cloud to explore this feature. New sign-ups receive $400 to spend within Confluent Cloud during their first 30 days. Use the code CCBLOG60 for an additional $60 of free usage.


Apache®, Apache Kafka®, and Kafka® are registered trademarks of the Apache Software Foundation.

  • Ahmed Zamzam leads the Technical Marketing team at Confluent, where he and his team create blogs, demos, videos, workshops, and reference architectures that showcase the power of Confluent’s Data Streaming Platform. With over 15 years of experience spanning Solution Architecture and Technical Marketing, Ahmed has a deep passion for helping organizations unlock the value of real-time data. When he’s not diving into streaming technologies, you’ll likely find him traveling the world, playing tennis, or cycling.

  • Hannah is a product marketer focused on driving adoption of Confluent Cloud. Prior to Confluent, she focused on growth of advertising products at TikTok and containers services at AWS

Avez-vous aimé cet article de blog ? Partagez-le !