Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now

What’s New in Apache Kafka 2.5

Written By

On behalf of the Apache Kafka® community, it is my pleasure to announce the release of Apache Kafka 2.5.0. The community has created another exciting release.

We are making progress on KIP-500 and have added new metrics and security features, among other improvements. This blog post goes into detail on some of the added functionality, but to get a full list of what’s new in this release, please see the release notes.

Kafka broker, producer, and consumer

KIP-500 update

In Apache Kafka 2.5, some preparatory work has been done towards the removal of Apache ZooKeeper™ (ZK).

  • KIP-555: details about the ZooKeeper deprecation process in admin tools
  • KIP-543: dynamic configs will not require ZooKeeper access

Exactly once semantics (EOS) – Foundational improvements

KIP-447: Producer scalability for exactly once semantics

This KIP simplifies the API for applications that read from and write to Kafka transactionally. Previously, this use case typically required separate producer instances for each input partition, but now there is no special requirement. This makes it much easier to build EOS applications that consume large numbers of partitions. This is foundational for a similar improvement in Kafka Streams in the next release.

See KIP-447 for more details.

KIP-360: Improve reliability of idempotent/transactional producer

This KIP addresses a problem with producer state retention on the broker, which is what makes the idempotence guarantee possible. Previously, when the log was truncated to enforce retention or truncated from a call to delete records, the broker dropped producer state, which led to UnknownProducerId errors. With this improvement, the broker instead retains producer state until expiration. This KIP also gives the producer a powerful way to recover from unexpected errors.

See KIP-360 for more details.

Metrics and operational improvements

KIP-515: Enable ZK client to use the new TLS supported authentication (ZK 3.5.7)

Apache Kafka 2.5 now ships ZooKeeper 3.5.7. One feature of note is the newly added ZooKeeper TLS support in ZooKeeper 3.5. When deploying a secure Kafka cluster, it’s critical to use TLS to encrypt communication in transit. Apache Kafka 2.4 already ships with ZooKeeper 3.5, which adds TLS support between the broker and ZooKeeper. However, configuration information has to be passed via system properties as -D command line options on the Java invocation of the broker or CLI tool (e.g., zookeeper-security-migration), which is not secure. KIP-515 introduces the necessary changes to enable the use of secure configuration values for using TLS with ZooKeeper.

ZooKeeper 3.5.7 supports both mutual TLS authentication via its ssl.clientAuth=required configuration value and TLS encryption without client certificate authentication via ssl.clientAuth=none.

See KIP-515 for more details.

KIP-511: Collect and Expose Client’s Name and Version in the Brokers

Previously, operators of Apache Kafka could only identify incoming clients using the clientId field set on the consumer and producer. As this field is typically used to identify different applications, it leaves a gap in operational insight regarding client software libraries and versions. KIP-511 introduces two new fields to the ApiVersionsRequest RPC: ClientSoftwareName and ClientSoftwareVersion.

These fields are captured by the broker and reported through a new set of metrics. The metric MBean pattern is:

kafka.server:clientSoftwareName=(client-software-name),clientSoftwareVersion=(client-software-version),listener=(listener),networkProcessor=(processor-index),type=(type)

For example, the Apache Kafka 2.4 Java client produces the following MBean on the broker:

kafka.server:clientSoftwareName=apache-kafka-java,clientSoftwareVersion=2.4.0,listener=PLAINTEXT,networkProcessor=1,type=socket-server-metrics

See KIP-511 for more details.

KIP-559: Make the Kafka Protocol Friendlier with L7 Proxies

This KIP identifies and improves several parts of our protocol, which were not fully self-describing. Some of our APIs have generic bytes fields, which have implicit encoding. Additional context is needed to properly decode these fields. This KIP addresses this problem by adding the necessary context to the API so L7 proxies can fully decode our protocols.

See KIP-559 for more details.

KIP-541: Create a fetch.max.bytes configuration for the broker

Kafka consumers can choose the maximum number of bytes to fetch by setting the client-side configuration fetch.max.bytes. Too high of a value may degrade performance on the broker for other consumers. If the value is extremely high, the client request may time out. KIP-541 centralizes this configuration with a broker setting that puts an upper limit on the maximum number of bytes that the client can choose to fetch.

See KIP-541 for more details.

Kafka Connect

KIP-558: Track a connector’s active topics

During runtime, it’s not easy to know the topics a sink connector reads records from when a regex is used for topic selection. It’s also not possible to know which topics a source connector writes to. KIP-558 enables developers, operators, and applications to easily identify topics used by source and sink connectors.

$ curl -s 'http://localhost:8083/connector/a-source-connector/topics'
{"a-source-connector":{"topics":["foo","bar","baz"]}}

The topic tracking is enabled by default but can also be disabled with topic.tracking.enable=false.

See KIP-558 for more details.

Kafka Streams

KIP-150: Add Cogroup to the DSL

In the past, aggregating multiple streams into one could be complicated and error prone. It generally requires you to group and aggregate all of the streams into tables, then make multiple outer join calls. The new co-group operator cleans up the syntax of your programs, reduces the number of state store invocations, and overall increases performance.

KTable<K, CG> cogrouped =
  grouped1
    .cogroup(aggregator1)
    .cogroup(grouped2, aggregator2)
    .cogroup(grouped3, aggregator3)
    .aggregate(initializer1, materialized1);

See KIP-150 for more details.

KIP-523: Add toTable() to the DSL

A powerful way to interpret a stream of events is as a changelog and to materialize a table over it. KIP-523 as a toTable() function can be applied to a stream and materializes the latest value per key. It’s important to note that any null values will be interpreted as deletes for a given key (tombstones).

See KIP-523 for more details.

KIP-535: Allow state stores to serve stale reads during rebalance

Previously, interactive queries (IQs) against state stores would fail during the time period when there is a rebalance in progress. This degraded the uptime of applications that depend on the ability to query Kafka Streams’ tables of state. KIP-535 gives applications the ability to query any replica of a state store and observe how far each replica is lagging behind the primary.

See KIP-535 and this blog post for more details.

Deprecations

We have dropped support for Scala 2.11 in Apache Kafka 2.5. Scala 2.12 and 2.13 are now the only supported versions.

TLS 1.2 is now the default SSL protocol. TLS 1.0 and 1.1 are still supported.

Conclusion

To learn more about what’s new in Apache Kafka 2.5 and to see all the KIPs included in this release, be sure to check out the release notes and highlights in the video below.

This post was originally published by David Arthur on The Apache Software Foundation blog.

  • David Arthur is a software engineer on the Core Kafka Team at Confluent. He has 10 years of experience designing and developing software for a wide variety of industries. David was an early user of Kafka and became a committer around the time Kafka became a top-level project at the Apache Software Foundation. He also authored a popular Python client for Kafka which received wide adoption, although he now recommends Confluent’s client 😊. Apart from software and open source, David enjoys spending time in his gardening, operating amateur radio, and spending time with his wife and three children.

Did you like this blog post? Share it now