Kora Engine, Data Quality Rules and more within our Q2'23 Launch | Register for demo
I’m proud to announce the release of Apache Kafka 3.2.0 on behalf of the Apache Kafka® community. The 3.2.0 release contains many new features and improvements. This blog will highlight some of the most prominent new features. For the full list of changes, be sure to see the release notes. You can also watch the release video for a summary of what’s new in Apache Kafka 3.2.0.
While KRaft mode is not yet recommended for production, we have introduced a KRaft-based authorizer along with several fixes and improvements. In addition, a proposal for marking KRaft mode as production ready in Apache Kafka 3.3 is being discussed by the community.
Since log4j 1.x has known security vulnerabilities and is not maintained anymore, we replaced it with reload4j. reload4j is a drop-in replacement with fixes for the known security vulnerabilities. We plan to migrate to log4j 2.x in the next major release of Apache Kafka (see KIP-653).
KIP-801 introduces a built-in authorizer, StandardAuthorizer, that does not depend on Zookeeper. This means you can now run a secure Kafka cluster without Zookeeper! StandardAuthorizer stores its ACLs in the
__cluster_metadata topic and it is used by default in KRaft clusters. StandardAuthorizer does all of the same things that AclAuthorizer does for Zookeeper-dependent clusters.
With KIP-704, the controller is now able to communicate to a newly elected topic partition leader whether it was elected using the unclean leader election strategy. This information tells the new topic partition leader that it needs to recover its state. As an example, this will be used in the future to clean up transaction state, which may be left inconsistent following an unclean election.
When there are many large clients, preferred leader election can cause many clients to open connections within a very short window of time. This can cause the SYN backlog for TCP’s acceptor sockets to be filled up resulting in retries being delayed or producers being slowed down.
KIP-764 introduces a new configuration
socket.listen.backlog.size that allows setting the size of the SYN backlog for TCP’s acceptor sockets on the brokers. Increasing this configuration can mitigate the issues resulting from many open connections.
KIP-784 adds an error code to the response of the
DescribeLogDirs API. In previous releases
DescribeLogDirs returned an empty response if users did not have the necessary authorization for the request. Clients had to interpret the empty response as a
CLUSTER_AUTHORIZATION_FAILED error. KIP-784 makes
DescribeLogDirs API consistent with other APIs and allows returning other errors besides
On Kafka brokers it is common to define multiple listeners. Each listener has its own pool of network threads. In many cases, some listeners handle a lot less traffic than others and typically do not need the same number of threads as the listeners that need to handle more traffic.
KIP-788 allows setting the pool size of network threads individually per listener. This allows for fine-tuning of the number of network threads to dynamically accommodate traffic spikes or slightly reduce memory usage when using listeners with different traffic loads. For this purpose, the existing configuration
num.network.threads is updated to support being set on specific listeners via
listener.name.<name of the listener>.num.network.threads.
The kafka-console-producer is an important debugging tool. KIP-798 provides a way to add headers to a record that is written to a topic. KIP-810 allows writing records with value
null to a topic. That means the kafka-console-producer can now produce tombstone records to a compacted topic. Both of these features improve debugging with the kafka-console-producer.
When a consumer leaves or joins a consumer group, it logs the reason locally. Until this release, the brokers did not have any information on the reasons a consumer joined or left the consumer group. That made rebalances triggered by
JoinGroupRequest hard to troubleshoot. KIP-800 propagates the reasons for leaving and joining a consumer group to the brokers, making troubleshooting rebalance issues easier.
Since Apache Kafka 2.4.0 when static membership was introduced, consumers can rejoin a consumer group after a brief absence without triggering a rebalance. If the leader of the consumer group had a brief absence and then rejoined, it would remain the leader. However, there was no way to let the rejoined consumer know that it was still the leader without triggering another rebalance. Ultimately this can cause the group to miss some metadata changes, such as partition increases. With KIP-814 the rejoining leader is informed about its leadership without computing a new assignment.
Starting with Apache Kafka 3.2.0, Kafka Streams can distribute its standby replicas over distinct “racks” with KIP-708. To form a “rack”, Kafka Streams uses tags in the application configuration. For example, Kafka Streams clients might be tagged with the cluster or the cloud region they are running in. Users can specify the tags that should be used for the rack-aware distribution of the standby replicas by setting the configuration
rack.aware.assignment.tags. During task assignment, Kafka Streams will try its best to distribute the standby replicas over different task dimensions. Rack-aware standby assignment improves fault tolerance in case of the failure of an entire “rack”. This can be used, for example, to ensure that replicas are distributed over different availability zones in a cloud hosting provider.
KIP-796 specifies an improved interface for Interactive Queries in Kafka Streams (IQv2). The new interface aims to make querying the state store simpler and faster as well as to reduce the maintenance cost when modifying existing state stores and when adding new state stores. KIP-796 describes the generic interface for querying state stores with Interactive Queries. Specific query types can be added to Interactive Query v2 by implementing the
Query interface. KIP-976 also defines the
KeyQuery class to allow users to evaluate a key/value lookup via IQv2.
KIP-805 adds the
RangeQuery class to Interactive Query v2. The
RangeQuery class is an implementation of the
Query interface that allows querying state stores over a range specified by upper or lower key bounds or scanning all records of a state store when no bounds are provided.
KIP-806 adds two implementations of the
WindowKeyQuery class and the
WindowRangeQuery class. The former allows scanning over windows with a given key within a given time range and the latter allows scanning over windows within a given time range independently of the keys of the windows.
KIP-796 is a long-term project that will be extended with new query types in future releases. As of Apache Kafka 3.2.0, IQv2 is in preview. The public documentation site has not been updated, and the interfaces of IQv2 are marked as
@Evolving (meaning that they may break compatibility in minor releases without a deprecation period if preview users find significant flaws in the current API). A future release will remove the
@Evolving annotation and designate IQv2 as stable.
KIP-791 adds method
recordMetadata() to the
StateStoreContext, providing access to the topic, partition, and offset of the record currently being processed. Exposing the current context in this way allows state stores to track their current offset in each input partition, allowing them to implement the consistency mechanisms introduced in KIP-796.
KIP-769 extends the
GET /connector-plugins endpoint with a new query parameter
connectorsOnly which when set to
false lists all the available plugins and not just connectors. The new query parameter helps users to verify what plugins are available without the need to know how the Connect runtime is set up. Usage of the new parameter is
GET /connector-plugins?connectorsOnly=false. By default
connectorsOnly is set to
true for compatibility with previous behavior.
Additionally, KIP-769 adds a new endpoint that will return the configurations of a given plugin. The new endpoint is used as follows:
GET /connector-plugins/<plugin>/config. The new endpoint works with all plugins returned by
KIP-808 introduces a new optional configuration field
unix.precision for the
TimestampConverter SMT that allows the user to define a desired precision for the SMT. Valid values for this new field are seconds, milliseconds, microseconds, and nanoseconds. This addition is motivated by the fact that in external systems Unix time is represented with different precisions.
KIP-779 makes source connectors resilient to producer exceptions. Since source connectors ingest data from systems users do not have control over, it might happen that received messages are too large or unprocessable for the configured Connect worker, Kafka broker, and other ecosystem components. Previously such an error always killed the connector.
With KIP-779 the
WorkerSourceTask checks for the configured
error.tolerance when sending the message fails. If
error.tolerance is set to
WorkerSourceTask will ignore the exception, allow the connector to acknowledge its source system and continue processing. If
error.tolerance is not set to
all, the source connector will fail.
A note on compatibility: Existing source connectors that set
errors.tolerance to all and expect to die on producer failure will need to be updated as described in the KIP. Source connectors that do not set
errors.tolerance to all will not be affected by this change and be killed in the event of a producer failure.
In addition to all the KIPs listed above, Apache Kafka 3.2.0 is packed with fixes and other improvements.
For next steps:
This was a huge community effort, so thank you to everyone who contributed to this release, including all our users and our 144 authors and reviewers:
A. Sophie Blee-Goldman, Adam Kotwasinski, Aleksandr Sorokoumov, Alexander Stohr, Alexandre Garnier, Alok Nikhil, Andras Katona, Andrew Eugene Choi, Antony Stubbs, Artem Livshits, Bill Bejeck, Bounkong Khamphousone, Boyang Chen, Bruno Cadonna, Chang, Chia-Ping Tsai, Chris Egerton, Colin P. McCabe, Cong Ding, David Arthur, David Jacot , David Mao, Ed B, Edwin, GauthamM-official, GuoPhilipse, Guozhang Wang, Gwen Shapira, Hao Li, Haoze Wu, Idan Kamara, Igor Soarez, Ismael Juma, Israel Ekpo, James Galasyn, James Hughes, Jason Gustafson, Jason Koch, Jeff Kim, Joel Hamill, John Roesler, Jonathan Albrecht, Jorge Esteban Quilcate Otoya, Josep Prat, Joseph (Ting-Chou) Lin, José Armando García Sancio, Jr, Jules Ivanic, Julien Chanaud, Jun Rao, Justin Lee, Justine Olshan, Kamal Chandraprakash, Kate Stanley, Kirk True, Knowles Atchison, Konstantine Karantasis, Kowshik Prakasam, Kurt Ostfeld, Kvicii, Leah Thomas, Lee Dongjin, Levani Kokhreidze, Liam Clarke-Hutchinson, Lucas Bradstreet, Ludovic DEHON, Luizfrf3, Luke Chen, Manikumar Reddy, Marc Löhe, Matthew Wong, Matthias J. Sax, Michal T, Mickael Maison, Mike Lothian, Márton Sigmond, Nick Telford, Nigel Liang, Niket, Okada Haruki, Omnia G H Ibrahim, Paolo Patierno, Patrick Stuedi, Philip Nee, Prateek Agarwal, Rajini Sivaram, Randall Hauch, Ricardo Brasil, Richard, RivenSun, Rob Leland, Ron Dagostino, Sayantanu Dey, Sean Li, Sergio Peña, Sherzod Mamadaliev, Shylaja Kokoori, Stanislav Vodetskyi, Tamara Skokova, Tim Patterson, Tolga H. Dur, Tom Bentley, Tomas Forsman, Tomonari Yamashita, Vicky Papavasileiou, Victoria Xia, Vijay Krishna, Vincent Jiang, Vladimir Sitnikov, Walker Carlson, Wenhao Ji, Wenjun Ruan, Xiaobing Fang, Xiaoyue Xue, YEONCHEOL JANG, Yang Yu, YeonCheol Jang, Yu, Zhang Hongyi, aSemy, bozhao12, defhacks, dengziming, florin-akermann, gf13871, jiangyuan, jiangyuan04, keashem, kurtostfeld, lhunyady, liym, loboxu, loboya~, mkandaswamy, prince-mahajan, sunshujie1990, vamossagar12, wangyap, xuexiaoyue, yasar03, zhonghou3, zzccctv, 工业废水, 彭小漪
This post was originally published by Bruno Cadonna on The Apache Software Foundation blog.
Companies are looking to optimize cloud and tech spend, and being incredibly thoughtful about which priorities get assigned precious engineering and operations resources. “Build vs. Buy” is being taken seriously again. And if we’re honest, this probably makes sense. There is a lot to optimize.
Operating Kafka at scale can consume your cloud spend and engineering time. And operating everyday tasks like scaling or deploying new clusters can be complex and require dedicated engineers. This post focuses on how Confluent Cloud is 1) Resource Efficient, 2) Fully Managed, and 3) Complete.