지금 Current 2022: The Next Generation of Kafka Summit에 등록해서 데이터 스트리밍의 미래를 라이브로 확인해 보세요!
I’m proud to announce the release of Apache Kafka 2.7.0 on behalf of the Apache Kafka® community. The 2.7.0 release contains many new features and improvements. This blog post highlights some of the more prominent ones. Be sure to see the release notes for the full list of changes. You can also watch the release video for a summary of what’s new:
In this release, we’ve continued steady progress toward the task of replacing ZooKeeper in Kafka with KIP-497, which adds a new inter-broker API for altering the in-sync replica (ISR). This release also provides the addition of the Core Raft Implementation as part of KIP-595. Now there is a separate “Raft” module containing the core consensus protocol. Until integration with the controller (the broker in the Kafka cluster responsible for managing the state of partitions and replicas) is complete, there is a standalone server that you can use for testing the performance of the Raft implementation.
Of course, there are additional efforts underway toward replacing Zookeeper, with seven KIPs in active development to provide support for more partitions per cluster, simpler operation, and tighter security.
Tiered Storage work continues and unlocks infinite scaling and faster rebalance times via KIP-405.
When a Java client producer aborts a transaction with any non-flushed (pending) data, a fatal exception is thrown. But aborting a transaction with pending data is in fact considered a normal situation. The thrown exception should be to notify you that records aren’t being sent, not that the application is in an unrecoverable state. KIP-654 introduces a new exception TransactionAbortedException, allowing you to retry if desired.
Currently, Kafka only supports JKS or PKCS12 file-based key and trust stores when using SSL. While it’s no longer a standard for email, the Privacy-Enhanced Mail (PEM) is a standard format for storing and distributing cryptographic keys and certificates. KIP-651 adds support for PEM files for key and trust stores, allowing the use of third party providers relying on the PEM format.
Creating connections adds CPU overhead to the broker. Connection storms can come from seemingly well-behaved clients and can stop the broker from performing other useful work. But now there is now a way of enforcing broker-wide and per-listener connection creation rates. The 2.7.0 release contains the first part of KIP-612, with per-IP connections rate limits expected to come in the 2.8.0 release.
The APIs to create topics, create partitions, and delete topics are operations that have a direct impact on the overall load in the Kafka controller. To prevent a cluster from being overwhelmed due to high concurrent topic and partition creations or topic deletions, there is a new quota limiting these operations. See KIP-599 for more details.
Apart from broker-client compatibility (for which Kafka has a strong record to ensure they do remain compatible), there are two main questions when new features become available in Kafka:
KIP-584 provides a flexible and operationally friendly solution for client discovery, feature gating, and rolling upgrades using a single restart.
With KIP-554, SCRAM credentials can be managed via the Kafka protocol and the kafka-configs tool was updated to use the newly introduced protocol APIs. This is another important step towards KIP-500 where ZooKeeper is replaced by a built in quorum..
Currently, Kafka partition leader and ISR information is stored in ZooKeeper. Either the controller or a partition leader may update this information. Because either can update this state, there needs to be a mechanism for sharing this information, which can cause delays in reflecting ISR changes. The impact of these delays means that metadata requests may receive stale information.
In the 2.7.0 release, there is a new
AlterIsr API, which gives the controller the exclusive ability to update the state of partition leaders and ISR. The chief benefit of this new API is that metadata requests will always reflect the latest state.
The addition of this API is a significant step forward in the process of removing ZooKeeper and the completion of KIP-500. For more information, see KIP-497.
Now you can print the headers on a
ConsumerRecord with the
ConsoleConsumer. See KIP-431 for more details.
KIP-632 adds a
DirectoryConfigProvider class to support users needing to provide secrets for keys stored in a container filesystem, such as a Kubernetes environment.
Today, if a user deletes the source topic of a running Kafka Streams application, the embedded consumer clients gracefully shut down. This client shutdown triggers rebalancing until all
StreamThreads of the Streams application gracefully exit, leaving the application completely shut down without any chance to respond to the error. With the addition of KIP-662, when a user deletes a source topic from a running Streams application, the app throws a MissingSourceTopicException, allowing for you to react to the error.
KIP-648 changes the getter methods for interactive query objects to follow the Kafka format of not using the get prefix.
Currently, when using an iterator over a Kafka Streams state store, you can only traverse elements from oldest to newest. When iterating over a windowed state store and the user desires to return the latest N records, there is no choice but to use the inefficient approach of traversing all the oldest records before getting to the desired newer records. KIP-617 adds support for iteration over a state store in reverse. Iterating in reverse makes a latest N records retrieval much more efficient.
Kafka Streams now how better Scala implicit Serdes support with KIP-616.
Currently, the actual end-to-end latency of a record flowing through Kafka Streams is difficult to gauge at best. Kafka Streams now exposes end-to-end metrics, which will be a great help for enabling users to make design choices. See KIP-613 for more information.
The current metrics exposed by Kafka Streams for RocksDB do not include information on memory or disk usage. Now in 2.7.0, Kafka Streams reports properties RocksDB exposes by default. See KIP-607 for more details.
Kafka Streams implements session windows, tumbling windows, and hopping windows as windowed aggregation methods. While hopping windows with a small advance time can imitate the behavior of a sliding window, this implementation’s performance is poor because it results in many overlapping and often redundant windows that require expensive calculations. With the addition of sliding windows via KIP-450, Kafka Streams now provides an efficient way to perform sliding aggregations.
To download Apache Kafka 2.7.0, visit the project’s download page.
Of course, this release would not have been possible without a huge effort from the community. A big thank you to everyone involved in this release, including the following 116 people (according to the git shortlog) who contributed either code or documentation:
A. Sophie Blee-Goldman, Chia-Ping Tsai, John Roesler, David Jacot, Jason Gustafson, Matthias J. Sax, Bruno Cadonna, Ismael Juma, Guozhang Wang, Rajini Sivaram, Luke Chen, Boyang Chen, Tom Bentley, showuon, leah, Bill Bejeck, Chris Egerton, Ron Dagostino, Randall Hauch, Xavier Léauté, Kowshik Prakasam, Konstantine Karantasis, David Arthur, Mickael Maison, Colin Patrick McCabe, huxi, Nikolay, Manikumar Reddy, Jorge Esteban Quilcate Otoya, Vito Jeng, bill, Bob Barrett, vinoth chandar, feyman2016, Jim Galasyn, Greg Harris, khairy, Sanjana Kaundinya, Ning Zhang, Aakash Shah, Andras Katona, Andre Araujo, Andy Coates, Anna Povzner, Badai Aqrandista, Brian Byrne, Dima Reznik, Jeff Kim, John Thomas, Justine Olshan, Lee Dongjin, Leonard Ge, Lucas Bradstreet, Mario Molina, Michael Bingham, Rens Groothuijsen, Stanislav Kozlovski, Yuriy Badalyantc, Levani Kokhreidze, Lucent-Wong, Gokul Srinivas, Mandar Tillu, Gal Margalit, tswstarplanet, Evelyn Bayes, Micah Paul Ramos, vamossagar12, Ego, Navina Ramesh, Nikhil Bhatia, Edoardo Comar, Nikolay Izhikov, Dhruvil Shah, Nitesh Mor, Noa Resare, David Mao, Raman Verma, Cheng Tan, Adam Bellemare, Richard Fussenegger, Rob Meng, Rohan, Can Cecen, Benoit Maggi, Sasaki Toru, Shaik Zakir Hussain, Shailesh Panwar, Sharath Bhat, voffcheg109, Thorsten Hake, Auston, Vikas Singh, Ashish Roy, Arjun Satish, xakassi, Zach Zhang, albert02lowis, Antony Stubbs, Ankit Kumar, gnkoshelev, high.lee, huangyiming, Andrew Egelhofer, jeff kim, jiameixie, Andrew Choi, JoelWee, Jesse Gorzinski, Alex Diachenko, Ivan Yurchenko, manijndl7, Igor Soarez, Gonzalo Muñoz, sbellapu, serjchebotarev, and Adem Efe Gencer
This post was originally published by Bill Bejeck on The Apache Software Foundation blog.
Bill Bejeck is working at Confluent as an integration architect on the Developer Relations team. He was a software engineer for over 15 years and has regularly contributed to Kafka Streams. Before Confluent, he worked on various ingest applications as a U.S. Government contractor using distributed software such as Apache Kafka, Spark, and Hadoop. He has also written a book about Kafka Streams titled Kafka Streams in Action.