Save 25% or More on Your Kafka Costs | Take the Confluent Cost Savings Challenge
Soon, Apache Kafka® will no longer need ZooKeeper! With KIP-500, Kafka will include its own built-in consensus layer, removing the ZooKeeper dependency altogether. The next big milestone in this effort is coming in Apache Kafka 2.8.0, where you will have early access to the new code, the ability to spin up a development version of Kafka without ZooKeeper, and the opportunity to play with the Raft implementation as the distributed consensus algorithm.
The blog post Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency discusses the problems with external metadata management, main architectural changes, and how ZooKeeper removal improves Kafka. Ultimately, removing ZooKeeper simplifies overall infrastructure design and operational workflows for your Kafka deployments. We’ve compiled a list of concrete benefits that result from this simplification, with a particular focus on things you will be able to STOP doing. It turns out that there are a lot of things you will be able to stop doing—and we think you won’t miss them.Once ZooKeeper is removed as a dependency from Kafka, your life gets easier in a few different areas:
ZooKeeper is an entirely separate system from Kafka, with its own deployment patterns, configuration file syntax, and management tools. If you remove ZooKeeper from Kafka, you no longer have to administer a separate service. Even more so, with KIP-500, you can optionally deploy the controller and broker in the same JVM, which further simplifies administration. You can now stop:
systemctl
for yet another Linux service (in contrast, with KIP-500, a controller and broker can optionally run in the same JVM)zookeeper.connection.timeout.ms
and zookeeper.session.timeout.ms
Storage is the main consideration for ZooKeeper deployments, and without ZooKeeper, you don’t have to deal with ZooKeeper capacity planning, disk issues, and snapshots. You can now stop:
autopurge.purgeInterval
and autopurge.snapRetainCount
One of the key changes with KIP-500 is improved control plane traffic. Without KIP-500, broker operations require reading metadata for all topics and partitions from ZooKeeper, and this can take a long time in a large cluster. With KIP-500 though, brokers store metadata locally in a log and read only the latest set of changes from the controller (similar to how Kafka consumers can read the very end of the log, not the entire log), improving operations from O(N) to O(1). Therefore, these control plane operations have significantly better performance, so you can now stop:
Any service in your mission-critical deployment must be monitored, and if you are using ZooKeeper, it must be monitored like every other service in your Kafka deployment. So if you remove ZooKeeper, you can stop:
NumAliveConnections
, OutstandingRequests
, AvgRequestLatency
, MaxRequestLatency
, HeapMemoryUsage
, etc.ZooKeeperDisconnectsPerSec
, ZooKeeperExpiresPerSec
, ZooKeeperReadOnlyConnectsPerSec
, ZooKeeperSyncConnectsPerSec
, ZooKeeperAuthFailuresPerSec
, ZooKeeperSaslAuthenticationsPerSec
, etc.org.apache.zookeeper.server.LogFormatter
and org.apache.zookeeper.server.SnapshotFormatter
If issues emerge in your Kafka deployment, ZooKeeper creates an added element that requires investigation. Without ZooKeeper, troubleshooting can focus on the core components, so you can now stop:
/var/log/messages
file fill up from verbose logs during network outagesnc
installed, perhaps due to enterprise policy (hint: echo srvr | (exec 3<>/dev/tcp/zk-host/2181; cat >&3; cat <&3; exec 3<&-) | grep -i mode
)Even though KIP-500 isn’t fully implemented yet, right now you can swing your tools from getting metadata from ZooKeeper over to getting metadata from the brokers instead, as described in the blog post Preparing Your Clients and Tools for KIP-500: ZooKeeper Removal from Apache Kafka.
With ZooKeeper | Without ZooKeeper | |
Configuring clients and services | zookeeper.connect=zookeeper:2181 |
bootstrap.servers=broker:9092 |
Configuring Schema Registry | kafkastore.connection.url=zookeeper:2181 |
kafkastore.bootstrap.servers=broker:9092 |
Kafka administrative tools | kafka-topics --zookeeper zookeeper:2181 ... |
kafka-topics --bootstrap-server broker:9092 … --command-config <properties to connect to brokers> |
REST Proxy API | v1 | v2 or v3 |
Getting the Kafka cluster ID | zookeeper-shell zookeeper:2181 get /cluster/id |
kafka-metadata-quorum or view metadata.properties or confluent cluster describe --url http://broker:8090 --output json |
And then try out the early access code coming in the next major Kafka release. Stay tuned for the Apache Kafka 2.8.0 release blog post for more details.
Companies are looking to optimize cloud and tech spend, and being incredibly thoughtful about which priorities get assigned precious engineering and operations resources. “Build vs. Buy” is being taken seriously again. And if we’re honest, this probably makes sense. There is a lot to optimize.
Operating Kafka at scale can consume your cloud spend and engineering time. And operating everyday tasks like scaling or deploying new clusters can be complex and require dedicated engineers. This post focuses on how Confluent Cloud is 1) Resource Efficient, 2) Fully Managed, and 3) Complete.