Confluent Platform

Dawn of Kafka DevOps: Managing Kafka Clusters at Scale with Confluent Control Center

Tim BerglundViktor Gamov
Last Updated: 

When managing Apache Kafka® clusters at scale, tasks that are simple on small clusters turn into significant burdens. To be fair, a lot of things turn into significant burdens at scale, and it’s Confluent Control Center’s job to ease as many of them as possible. In Confluent Platform 5.2, Control Center has grown a couple of new features that make large deployments a little more pleasant to manage: It has become much better at managing configuration changes among a large number of brokers, and it scales to a larger number of managed partitions. Let us explain these two in a bit of detail.

Dynamic broker configuration

Two challenges present themselves when configuring a large number of brokers: visualizing the differences in broker configuration and modifying them without resorting to rolling restarts everywhere. These are problems endemic to any distributed system that uses a large number of the same node type, and they become more costly as the number of nodes grows. Confluent Platform 5.2 makes solving both of these problems a bit easier.

In previous versions of Control Center, you could view and download broker configurations, which was good as far as it went. If you wanted to compare broker configurations (hardly an unusual thing to do when trying to troubleshoot a misbehaving cluster), you were left to do the diffing on your own. As of 5.2, you can now directly view the differences in configurations from within Control Center.

Confluent Control Center

Relatedly, KIP-226 enabled dynamic broker reconfiguration since Apache Kafka 1.1. Put this together with the configuration view, and you now have a powerful way to get misconfigured brokers whipped into shape. Not only can you view their configurations side by side, but you can also make changes to parameters as needed without having to restart the brokers, which is an enormous time savings no matter how you manage your deployment. Of course, not every broker configuration parameter can be changed dynamically. See the documentation (or, if you please, the Apache Kafka wiki) for a complete list of which parameters this applies to.

Confluent Control Center

Dynamic broker configuration is enabled by default. To disable, set confluent.controlcenter.broker.config.edit.enable=false in the control-center.properties. And remember, you can always consult the documentation for more information.

Improved scalability

Control Center gives you visibility into the behavior of each topic that it manages, which means it has to keep track of the state of each partition in each topic. (No, this is not a terribly surprising claim of software architecture, but work with me here.) This, in turn, implies that Control Center has its own upper bounds on how large a deployment it can manage. Just how many topics Control Center can handle turns out to be somewhat of a complicated question, depending on replication factor and the distribution of topic partition counts. A cluster may have many topics with a small number of partitions, a small number of topics with many partitions or something in between; and a cluster may have a large replication factor or a small one.

Those statistical variations notwithstanding, Control Center has leveled up significantly in Confluent Platform 5.2. It can now comfortably handle around 120,000 individual partition replicas. Assuming a very typical replication factor of three, this means it can now manage around 40,000 individual partitions. If you take a typical topic partition count of six, this translates to a cluster of about 6,700 topics—which, you might be thinking, is a lot. Most users of Control Center don’t manage deployments that large; in fact, our data tells us that this covers more than 90% of existing customer workloads.

Cluster with Partitions

Operating large systems is always a lot of work, and Confluent Platform is singularly focused on making it easier for you. Confluent Platform 5.2 has moved the ball forward in two areas: Control Center’s ability to manage larger clusters, and its ability to help you more easily compare broker configurations and dynamically change them. If you’re running a big cluster, these features will make life that much more pleasant for you.

If you’re still not using Control Center or other parts of the Confluent Platform, you know what to do! Go here, download and check it out.

Other articles in this series:

Tim Berglund is a teacher, author and technology leader with Confluent, where he serves as the Senior Director of Developer Experience. He can frequently be found at speaking at conferences in the U.S. and all over the world. He is the co-presenter of various O’Reilly training videos on topics ranging from Git to distributed systems, and is the author of “Gradle Beyond the Basics.” He tweets as @tlberglund and lives in Littleton, CO, U.S., with the wife of his youth and their youngest child, the other two having mostly grown up.

Viktor Gamov is a developer advocate at Confluent, the company that makes an event streaming platform based on Apache Kafka. Working in the field, Viktor Gamov has developed comprehensive expertise in building enterprise application architectures using open source technologies. Back in his consultancy days, he co-authored O’Reilly’s “Enterprise Web Development.” He is a professional conference speaker on distributed systems, Java and JavaScript topics. Follow Viktor on Twitter @gAmUssA, where he posts there about gym life, food, open source and, of course, Kafka and Confluent!

Subscribe to the Confluent Blog

Subscribe

More Articles Like This

Incremental Cooperative Rebalancing in Apache Kafka: Why Stop the World When You Can Change It?
Konstantine Karantasis

Incremental Cooperative Rebalancing in Apache Kafka: Why Stop the World When You Can Change It?

Konstantine Karantasis

There is a coming and a going / A parting and often no—meeting again. —Franz Kafka, 1897 Load balancing and scheduling are at the heart of every distributed system, and […]

Introducing Confluent Platform 5.3
Gaetan Castelein

Introducing Confluent Platform 5.3

Gaetan Castelein

Delivers the new Confluent Operator for cloud-native automation on Kubernetes, a redesigned Confluent Control Center user interface to simplify how you manage event streams, and a preview of Role-Based Access […]

Dawn of Kafka DevOps: Managing Multi-Cluster Kafka Connect and KSQL with Confluent Control Center
Robin Moffatt

Dawn of Kafka DevOps: Managing Multi-Cluster Kafka Connect and KSQL with Confluent Control Center

Robin Moffatt

In anything but the smallest deployment of Apache Kafka®, there are often going to be multiple clusters of Kafka Connect and KSQL. Kafka Connect is used for building event streaming […]

Fully managed Apache Kafka as a Service!

Try Free