Register now: The Top Five Use Cases & Architectures for Data In Motion in 2022

How to Securely Connect Confluent Cloud with Services on AWS, Azure, and GCP

The rise of fully managed cloud services fundamentally changed the technology landscape and introduced benefits like increased flexibility, accelerated deployment, and reduced downtime. Confluent offers a portfolio of fully managed connectors that enables quick, easy, and reliable integration of Confluent Cloud with popular data sources and sinks, connecting your entire system in real-time. However, with the adoption of cloud-based technologies came opportunities for data security breaches, DDoS attacks, and spam if connections are not secure.

With enterprise architectures becoming more complex by the day due to hybrid and/or multi-cloud environments, as well as the use of multiple vendors for data systems, the path to secure networking isn’t always easy. That’s why a common question we receive from users of our fully managed connectors is how to securely connect to their data sources and sinks. This will depend on several factors: where the data source/sink is located (on-prem vs. cloud), whether the source/sink is from a cloud provider or a 3rd party service (AWS/Azure/GCP vs. MongoDB/Snowflake), and whether the user wants to connect over public or private networks.

In this blog post, we will examine two currently supported options for secure networking on Confluent Cloud: securely connecting using a public IP address with static egress IPs (available on AWS clusters) and connecting using a private IP address with VPC peering on all clouds and transit gateway on AWS.

Option 1: Connecting securely to a source/sink using a public IP address

Connecting securely to a source/sink using a public IP address

One of the most common scenarios when using fully managed connectors is connecting to a source/sink using its public IP address. To offer additional security when connecting over a public endpoint, we’ve recently launched static egress IPs for Confluent Cloud clusters in AWS that have public internet networking. With this feature, you can get a small list of IP addresses that the fully managed connector will use to connect to your data source or sink. You can then set up a rule to allow access to your sources or sinks only from these IP addresses. By doing so, you can add an extra layer of security and drastically reduce the attack surface for your source or sink systems.

Using IP filtering to secure your source/sink systems is relatively easy to set up as many managed data systems today offer the ability to restrict the IP addresses that can access these systems. Even when working with self-managed data systems such as a self-managed database hosted on the cloud or in an on-premise environment, firewall rules can be leveraged to limit system access to only those from specific IP addresses.

The below step-by-step guide shows you how to obtain the static egress IP addresses and configure your data source/sink systems to only accept connections from these IP addresses. For this example, we will be reading data from an Amazon RDS for PostgreSQL database and sinking this data into Snowflake. We will be configuring both the source and sink to only allow access from the static egress IPs provided by Confluent.

Here are the prerequisites to follow this guide:

  • An Apache Kafka® cluster in Confluent Cloud on AWS with the Internet networking type
  • An instance of Amazon RDS for PostgreSQL with sample data
  • A Snowflake instance
Note
You can use the promo code CL60BLOG for an additional $60 of free Confluent Cloud usage.*

Once the Kafka cluster is created on Confluent Cloud, navigate to the networking page by clicking on the Cluster Overview > Networking link in the sidebar.

List of IPs under the “Egress IPs” section

Once you are on the networking page, you should see a list of IPs under the “Egress IPs” section on this page. If you don’t see this section, make sure you have selected your cluster type on AWS and configured it with the “Internet” networking type. Copy the IP addresses listed there as we will be adding this list in our data source and sinks to enable access from the fully managed connector.

If your Confluent Cloud cluster is on Azure or GCP, for which we don’t yet support static egress IPs, you can look up the IP addresses for the cloud region in which your Confluent Cloud cluster is located and use those as the IP allowlist. Although this method is not foolproof as the IP address range would be much broader than Static Egress IPs, it will considerably restrict the IP space for origin of attack.

Now that we have the Egress IPs list, we can configure the AWS RDS for PostgreSQL instance to allow access from these IP addresses with the following steps. First, ensure the PostgreSQL instance is configured with public access. Next, create a security group that allows inbound access to the PostgreSQL database port (default is port 5432) for the static egress IP addresses. To do so, edit the inbound rules for the security group associated with the PostgreSQL instance. Create a new inbound rule with type “PostgreSQL” source set to “custom” and then paste in the IP addresses from the list copied earlier, separated by commas as shown in the image below.

Create a new inbound rule with type “PostgreSQL” source set to “custom” and then paste in the IP addresses from the list

If you are connecting to a different source or sink system, you would follow similar instructions for that system to grant access to the egress IP addresses.

Once the rule has been added, save the rules. Now in Confluent Cloud, create a new PostgreSQL source connector for your Kafka cluster using the public hostname of the PostgreSQL database. Since the egress IP addresses have already been granted access, your fully managed connector will now be able to securely connect to the database without having to fully expose your database to the entire internet.

Now that we have the PostgreSQL source connector setup, we need to configure the Snowflake instance to allow access to the static egress IP’s provided by Confluent. To do so you need to create a new network policy and assign it to the Snowflake account that allows access to the list of IPs provided in the Cluster Overview > Networking section. When creating the network policy, you need to use a role that has the right permissions such as the SECURITYADMIN role. Make sure to review the usage notes in Snowflake’s documentation before creating and assigning the network policy.

Snowflake management console showing the network policy creation
Snowflake management console showing the network policy creation

Once the network policy has been created and assigned, you can then finish configuring the Snowflake sink connector on Confluent Cloud to connect to the public endpoint of your Snowflake instance and launch the connector. Once launched, the sink connector will be able to securely connect to the Snowflake instance and will be able to write the records from the configured Kafka topic.

Option 2: Connecting to a source/sink using a private IP address

For users who want to connect to their data systems using a private IP address, there are a couple of different options depending on the source/sink system you are connecting to and the networking type for your Confluent Cloud cluster. While it is a more complex setup, connecting via a private IP address offers additional security as data is not traversing over the public internet and you don’t have to expose your data system to the internet.

To connect over a private IP address, your Confluent Cloud cluster must be set up with VPC peering (available on AWS and GCP; VNet peering on Azure) or transit gateway (AWS only). There are two common patterns for connecting over private IP addresses that are currently supported.

The first pattern is when the data source/sink resides in your VPC, which is peered with Confluent Cloud. In this case, the fully managed connector will connect to the source/sink using a private IP address over the peering connection. This networking pattern can be used for source/sink systems that are provided by the cloud providers such AWS Redshift, Azure CosmosDB, Google BigQuery, etc, and for self-managed sources/sinks that are hosted on the cloud.

Data source/sink residing in your VPC

The second supported pattern is to create a private link endpoint to the data source/sink in the VPC that is peered with Confluent Cloud. This enables you to connect to 3rd party managed sources/sinks such as MongoDB Atlas. Note that the 3rd party service must support connectivity via a publicly resolvable DNS for the private link endpoint or via the private IP address of the endpoint. The picture below illustrates this type of connectivity.

Private link endpoint to the data source/sink in the VPC

Summary

In this blog post, we covered various ways to use fully managed connectors on Confluent Cloud to securely connect to your existing data systems on AWS, Azure, GCP, and those hosted on-premises as well. The newly released static egress IPs feature for Confluent Cloud clusters on AWS makes it easy to secure connections to data systems such as Snowflake, Redshift, MongoDB, and Elasticsearch even when using public endpoints. For customers unable to use public endpoints, there are ways to connect over a private network by leveraging networking options such as peering or transit gateway. Confluent is also closely working with AWS, Azure, and GCP to provide additional secure networking options in the future.

If you are not already using our fully managed connectors, you can get started by signing up for a free trial of Confluent Cloud and start using one of our fully managed connectors. You can use the promo code CL60BLOG for an additional $60 of free usage.*

Get Started

Shiva Mogili is a senior product manager on the Kafka Connect team focusing on the Kafka Connect framework, Connect common services, and Confluent Hub. Prior to Confluent, Shiva led product management for Hitachi Vantara’s IoT Edge Analytics product and was a part of GoPro’s aerial products team. He has an MBA from the University of North Carolina-Chapel Hill and a BS in Engineering from the University of Illinois at Urbana-Champaign.

Did you like this blog post? Share it now

Subscribe to the Confluent blog

More Articles Like This

Stream Governance – How it Works

At the recent Kafka Summit, Confluent announced the general availability of Stream Governance–the industry’s only governance suite for data in motion. Offered as a fully managed cloud solution, it delivers

Extracting Value from IoT Using Azure Cosmos DB, Azure Synapse Analytics, and Confluent Cloud

Today, an organization’s strategic objective is to deliver innovations for a connected life and to improve the quality of life worldwide. With connected devices comes data, and with data comes

Trigger AWS Lambda Functions Directly from an Apache Kafka Topic

The distributed architecture of Apache Kafka® can cause the operational burden of managing it to quickly become a limiting factor for adoption and developer agility. For this reason, it is