[Webinar] Build Your GenAI Stack with Confluent and AWS | Register Now

Confluent Cloud Is Now 100% KRaft and You Should Be Too

Written By

We are now in the final chapter of Apache Kafka’s® multi-year journey to remove Apache ZooKeeper™ and fully transition to self-managed metadata in KRaft. Many Kafka users and customers are beginning to migrate to KRaft and are eager to understand its performance characteristics in production environments. The most pressing question is: "Is it safe to move off ZooKeeper?"

Confluent has just completed the largest known migration of Kafka clusters to KRaft, moving thousands of clusters to KRaft without downtime. With KRaft, we've streamlined our architecture, improved our scalability, and stabilized operations across our cloud clusters.

Now, it’s your turn to benefit from KRaft’s performance and scalability enhancements. Whether you operate a single cluster or hundreds, KRaft is ready for production use, and your migration journey can begin with confidence.

Background on KRaft

KRaft is Kafka's new consensus protocol that replaces ZooKeeper for metadata management and cluster consensus. This shift marks a significant architectural change in Kafka. Metadata, such as information about topics, brokers, and partitions, was previously managed externally in ZooKeeper. However, as Kafka clusters grew in size and complexity, the need to streamline the system became apparent​.

The core idea behind KRaft is to internalize the management of metadata within Kafka itself using a Raft-base quorum consensus protocol. With this, Kafka has gained the benefit of tailor-made metadata APIs and storage formats. This has significantly increased the scalability of metadata within Kafka which, in turn, increases the scalability of Kafka itself. 

The journey to remove ZooKeeper has been a multi-year effort, beginning with the introduction of KRaft in Kafka 2.8 as an experimental feature, and progressing through several iterations and improvements. KRaft became production ready in Kafka 3.3. As of Kafka 3.7, the migration process from ZooKeeper to KRaft is also production-ready.

ZooKeeper mode was deprecated in Kafka 3.5. With Kafka 4.0 and Confluent Platform 8.0, Kafka will complete its journey to KRaft by removing the deprecated ZK mode, eliminating the operational overhead of ZooKeeper while enhancing Kafka’s ability to support larger and more complex environments​.

With KRaft, Kafka can now rely on a more unified, resilient system for managing metadata, supporting millions of partitions, improving controller failover times, and streamlining security models. This transformation not only improves Kafka’s scalability but also makes it easier to operate, reducing the complexity of Kafka deployments​.

Confluent Cloud’s migration to KRaft

Migrating thousands of clusters to KRaft in Confluent Cloud was one of the most significant operational challenges that we ever faced. The migration included everything from small development clusters, to massively high throughput clusters, to mission-critical production systems. This also included extremely large multi-tenant clusters with tens of thousands of partitions and dozens of brokers. The stakes were extremely high for this process. The main objectives were to ensure that the migration process was safe, reversible, did not introduce downtime, and upheld Confluent’s stringent SLA commitments.

Technical challenges

Thousands of clusters: The scale of Confluent Cloud meant that each of these clusters had to transition from ZooKeeper to KRaft without any service interruption. We set a goal for ourselves to make the migration no more impactful than a typical controller failover. This required meticulous planning, development, and execution, to ensure that every cluster was transitioned smoothly and safely.

High-throughput multi-tenant clusters: Some of our most complex environments include multi-tenant clusters, where different users share infrastructure. Ensuring that KRaft would handle these workloads efficiently was a priority. Our experience migrating these clusters proves that KRaft’s scalability and resilience are ready for prime time​.

KRaft benefits post cloud migration

Enhanced scalability: KRaft’s quorum-based consensus model allows us to scale efficiently. By consolidating metadata management within Kafka itself, KRaft enables us to handle millions of partitions across a fleet of clusters with improved efficiency.

Improved operational stability: Migrating to KRaft took the better part of a year, but it has paid off by simplifying our cloud operations. With ZooKeeper eliminated, we’ve reduced the complexity of managing our clusters, and this simplification has resulted in better overall system stability​.

Are you ready to migrate?

The answer is “yes”!

We believe you can approach your own migration to KRaft with confidence because we’ve already completed the largest migration in history. Confluent has migrated all its cloud clusters to KRaft without any impact on customer SLAs. This seamless transition demonstrates KRaft's readiness to handle production workloads at scale, and you can trust that KRaft is battle-tested in one of the most demanding environments: Confluent Cloud​.

Preparing for Kafka 4.0 and Confluent Platform 8.0

With KRaft becoming the sole metadata layer in Apache Kafka 4.0 and Confluent Platform 8.0, users need to migrate before upgrading. Confluent recommends using the latest releases in the 3.7, 3.8, or 3.9 branches for a smooth transition. Confluent has made this migration process familiar, similar to typical Kafka upgrades, making it straightforward for experienced operators.

Migration from ZooKeeper to KRaft is done through a series of configuration changes and rolling restarts. First, the existing cluster ID is obtained so the new KRaft quorum can be provisioned appropriately. Next, the KRaft controller quorum is provisioned. The brokers are then reconfigured to communicate with KRaft and restarted one by one. Once all of the brokers have been restarted, the metadata migration happens automatically. 

At this point, the system writes metadata to both ZooKeeper and KRaft. We call this “dual-write” mode. The purpose of this state is to allow the operator to safely roll back to ZooKeeper if any problem is encountered.

Another round of reconfigurations and restarts of the brokers will bring the cluster fully into KRaft mode. One final rolling restart of the controllers will finalize the migration.

Detailed instructions for this process are available to help with the migration process. 

Automated migration with Confluent for Kubernetes (CFK)

If you're using Kubernetes, Confluent for Kubernetes (CFK) provides an automated way to handle the migration from ZooKeeper to KRaft. CFK simplifies this process with the following steps:

  1. Deploy KRaft controllers Use CFK's custom resource definitions (CRDs) to deploy KRaft controllers. You need at least three KRaft controller replicas to establish quorum.

  2. Configure CRDs for migration CFK handles much of the configuration work, including locking ZooKeeper and Kafka resources during the migration. Ensure that webhooks are enabled to enforce these locks. 

  3. Perform the migration CFK automatically retrieves the Kafka cluster ID and executes the migration through its declarative API. Once the migration is complete, manually remove the ZooKeeper cluster if it's no longer in use.

For more detailed examples and workflows, you can explore Confluent’s GitHub.

Automated migration with Ansible Playbooks

Confluent also offers a set of Ansible Playbooks for automating the migration in traditional on-prem or cloud environments. These playbooks simplify the following:

  1. Automated configuration Ansible automates the configuration of KRaft controllers and Kafka brokers, ensuring that they’re ready for the migration.

  2. Orchestration of rolling restarts The playbooks handle rolling restarts of brokers and controllers, allowing for a zero-downtime migration from ZooKeeper to KRaft.

  3. Validation and finalization After migrating, Ansible playbooks can validate the migration’s success by ensuring that all metadata has been transferred to the new KRaft quorum.

Conclusion

KRaft is the future of Kafka metadata management, and the time to migrate is now. Confluent Cloud’s seamless transition to KRaft demonstrates the power of this new protocol. Whether you prefer manual migration or automation via CFK and Ansible Playbooks, transitioning to KRaft will ensure your Kafka clusters are ready for the future. Look out for our technical deep dive next on how exactly to migrate from ZooKeeper to KRaft.

Want to take advantage of the benefits of KRaft without the management? Get started on Confluent Cloud today and create a free cluster. For Confluent Platform users, download the latest version and stay tuned for more new updates and enhancements coming with Confluent Platform 8.0.

Apache®, Apache ZooKeeper®, Apache KRaft are trademarks of the Apache Software Foundation.

  • Chase Thomas is a Group Product Manager at Confluent where he focuses on Apache Kafka. Prior to Confluent he held product roles at Splunk and at AWS Managed Streaming for Apache Kafka (MSK). He started his career building real-time instrumentation systems on dams across California. Chase has an MS-MBA from BYU and the Marriott School of Business. In his free time, you’ll find Chase fishing in the outdoors with his family.

  • David Arthur is a software engineer on the Core Kafka Team at Confluent. He has 10 years of experience designing and developing software for a wide variety of industries. David was an early user of Kafka and became a committer around the time Kafka became a top-level project at the Apache Software Foundation. He also authored a popular Python client for Kafka which received wide adoption, although he now recommends Confluent’s client 😊. Apart from software and open source, David enjoys spending time in his gardening, operating amateur radio, and spending time with his wife and three children.

Did you like this blog post? Share it now