[Webinar] Mastering Kafka Security Across Hybrid Environments → Register Now

How Let’s Encrypt Powers Confluent Cloud to Automate Its Certificate Operations

Written By

Since the inception of our cloud journey, we have extensively utilized Let's Encrypt because it has been very reliable, fully automated, open, and free. Today, we’re proud to become an official sponsor of Let’s Encrypt. In this blog post, we’re celebrating this event by explaining our journey with Let’s Encrypt, how we integrate with their service, and why we chose them.

Our cloud platform was initially simpler, so we started with a simple certificate management approach and integration with Let's Encrypt. We issued a single certificate per cloud-provider region, used it for all connections in that region, and renewed it every 60 days. This certificate management was handled within a monolithic system which was also responsible for overall resource provisioning like clusters, networks, secrets, etc. 

As our product evolved to support new use cases such as Confluent Cloud Networks and Private DNS Resolution, our certificate management strategy needed to evolve as well.

For our new certificate management system, we had the following goals:

  • Support certificates for diverse endpoint structures depending on network connectivity chosen by customers

  • Extract certificate management out of our original monolith into a separate service(s) for improved velocity, reliability, and operational efficiency

  • Move the process of obtaining certificates from CA (Lets Encrypt), which involves high latency operations like DNS challenges, out of the critical path for Confluent Cloud resource provisioning

  • Allow support for additional certificate providers in the future if needed (as long as they are ACME protocol compatible)

To meet these goals, we designed the new system to be much more configurable. Based on the network connectivity chosen by customers to connect to our cloud platform, we now provide appropriate cluster endpoints that are compatible with that connectivity. This has an impact on the specific certificates we utilize, particularly the Subject Alternative Name (SAN) list of the certificate. For instance, our PrivateDNS solution requires the inclusion of zonal endpoints in the SAN list of certificates. We support this diverse range of certificate SANs in our certificate management system through the notion of "certificate schemes," allowing the definition of templates for certificate domain lists (SAN lists). It relies on these defined templates to materialize a full domain list for certificates honoring cloud, region, network IDs, etc. It will then procure these certificates and pool them in a secret-store ahead of time to be ready to be associated with clusters. Certificates are pooled based on templates, regions, etc, and each pool size is configurable considering the traffic and demand in each Confluent Cloud region.

We have leveraged the Lego library, which is an implementation of the ACME protocol, to facilitate seamless integration with LetsEncrypt and enable the handling of challenges. By building upon the abstractions provided by Lego, we have achieved a level of independence from the specific DNS service implementations across the various cloud providers we work with. Moreover, this approach has enabled us to fully automate the process of setting up periodic certificate renewals. Although LetsEncrypt is our primary certificate provider, the flexibility offered by the ACME protocol allows us to integrate with a backup certificate authority if necessary.

Ongoing monitoring of active certificates under management ensures timely identification of expiration dates, triggering renewal attempts when the certificates are within 30 days of expiring. Following a successful renewal, an event is published, and the updated certificate is promptly synchronized to the target clusters.

The adoption of centralized certificate management in Confluent Cloud brings numerous benefits. By pre-provisioning a pool of certificates ahead of other Confluent Cloud resources, we prevent the impact of latency incurred from DNS challenges per domain on our resource provisioning such as networks. We are also insulated against temporary outages with certificate providers or issues in certificate provisioning.  Automated certificate monitoring and renewal mitigate the risk of serving expired certificates, ensuring uninterrupted connectivity for customers. Centralized management streamlines certificate lifecycle monitoring and renewal processes, enhancing overall operational efficiency.

Our new centralized certificate management service simplifies network security and improves operational efficiency. By defining certificate schemes, generating domain lists, and pooling certificates, Confluent Cloud ensures secure connections for its network-bound services. The integration with the Lego library streamlines certificate procurement and management across multiple cloud providers. With centralized certificate management in place, Confluent Cloud is well-prepared to support evolving network access models while maintaining a robust security posture.

As the original author behind the ACME automation standard, Let's Encrypt has established itself as one of the most innovative CA certificate providers and the most robust platform to rely upon. With its ability to support advanced certificate configurations like those required for Confluent Cloud and widely trusted certificates, Let's Encrypt has been the logical choice as a CA Certificate Provider for Confluent. Today, we integrate with Let's Encrypt's APIs to get tens of thousands of TLS/SSL certificates every week and in a fully automated fashion. This allows our team to spend less time managing certificates and more time adding more advanced networking features for our customers.

To see these certificates in action,  you can check out Confluent Cloud, a fully managed event streaming service based on Apache Kafka.

  • Emmanuel Bertrand is a group product manager at Confluent, where he is responsible for the Confluent Cloud platform’s control plane. Prior to Confluent, Emmanuel worked at Microsoft as a product manager for over 10 years. Originally from France, he now lives in Seattle, WA.

  • Roger began his career at E*Trade Financial focused on their core microservice infrastructure. He was a Principal Engineer at a mobile marketing startup acquired by HelloWorld and at Palo Alto Research Center, where he first started using and contributing to Apache Kafka and related projects like Camus. Before joining Confluent, he was responsible for architecting the stream data infrastructure and building high-throughput, real-time intelligence applications for E*Trade.

  • Shahzeb Patel is a staff software engineer at Confluent, working with the Control Plane team. His primary focus areas include infrastructure services responsible for resource provisioning and life-cycle management in Confluent Cloud. Prior to joining Confluent, Shahzeb worked as a software engineer at Dropbox and VMware.

Did you like this blog post? Share it now