Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now

Announcing Apache Kafka 0.10.1.0

Written By

The Apache Kafka community is pleased to announce the release of Apache Kafka 0.10.1.0. This is a feature release which includes the completion of 15 KIPs, over 200 bug fixes and improvements, and over 500 pull requests merged. A full list of the new features, improvements, and bug fixes can be found here.

Major Features

Kafka Server

Time-based Search: This release of Kafka adds support for a searchable index for each topic based off of message timestamps, which were added in 0.10.0.0. This allows for finer-grained log retention than was possible previously using only the timestamps from the log segments. It also enables consumer support for offset lookup by timestamp, which allows you to seek to a position in the topic based on a certain time.

Replication Quotas: Previously, data-intensive administrative operations (such as partition reassignment and new broker initialization) could affect client performance because of the stress on the cluster from unbounded replication. Replication quotas gives you a way to set an upper bound on the bandwidth used for replication so that the impact on clients is predictable and tunable.

Improved Log Compaction: Previously it was difficult for clients to build a consistent snapshot of a compacted topic since there was no way to tell when the consumer had reached the uncompacted portion of the log. Kafka now provides a configuration setting to control the time range that is eligible for log cleaning (leveraging the time-based index). As long as the consumer has read up to a recent point in the log, it is ensured to see a consistent view. Additionally, we have made it possible to enable both compaction and deletion on the same topic, which allows you to retain only data within a certain time window. We’ve also fixed a critical bug which causes the log cleaner to crash if the number of keys in a segment gets too large.

Kafka Client APIs

Interactive Queries: First introduced in our post introducing Kafka Streams and since emulated in various stream processing platforms, this release adds support for interactive queries, which allows you to treat the stream processing layer as a lightweight embedded database and to directly query the latest state of your stream processing application, without needing to materialize that state to external databases or external storage first. See here for more detail on how to use this feature.

Consumer Stabilization: The consumer now supports background heartbeating, which handles use cases with higher variance in message processing times more gracefully. Additionally, we now have a configuration option to override the maximum size of individual fetches to give you an easier way to tune memory usage. Due to these and other improvements, we’ve removed the beta label from the new consumer.

Improved memory management: Streams applications now benefit from record caches. Notably, these caches are used to compact output records (similar to Kafka’s log compaction) so that fewer updates for the same record key are being sent downstream. These new caches are enabled by default and typically result in reduced load on your streams application, your Kafka cluster, and/or downstream applications and systems such as external databases.

Secure Quotas: Kafka client quotas, first introduced in 0.9.0.0, uses the unauthenticated client.id configured by the user. Now secure quotas allows you to define quotas based off of the authenticated user principal, which gives your cluster much better protection in a secure environment.

Notable Bug Fixes and Improvements

Kafka Server

KAFKA-4093: Cluster id

KAFKA-4298: LogCleaner writes inconsistent compressed message set if topic message format != message format

KAFKA-3894: LogCleaner thread crashes if not even one segment can fit in the offset map

KAFKA-3396: Unauthorized topics are returned to the user

KAFKA-4019: LogCleaner should grow read/write buffer to max message size for the topic

KAFKA-3916: Connection from controller to broker disconnects

KAFKA-4129: Processor throw exception when getting channel remote address after closing the channel

Kafka Client APIs

KAFKA-3680: Make Java client classloading more flexible

KAFKA-2948: Kafka producer does not cope well with topic deletions

KAFKA-3937: Kafka Clients Leak Native Memory For Longer Than Needed With Compressed Messages

KAFKA-3854: Subsequent regex subscription calls fail

KAFKA-4098: NetworkClient should not intercept all metadata requests on disconnect

KAFKA-3775: Throttle maximum number of tasks assigned to a single KafkaStreams

KAFKA-3776: Unify store and downstream caching in streams

KAFKA-3812: State store locking is incorrect

KAFKA-3938: Fix consumer session timeout issue in Kafka Streams

KAFKA-4153: Incorrect KStream-KStream join behavior with asymmetric time window

KAFKA-3845: Support per-connector converters

KAFKA-3846: Connect record types should include timestamps

KAFKA-2894: WorkerSinkTask doesn’t handle rewinding offsets on rebalance

KAFKA-3054: Connect Herder fail forever if sent a wrong connector config or task config

KAFKA-3850: WorkerSinkTask should retry commits if woken up during rebalance or shutdown

KAFKA-4042: DistributedHerder thread can die because of connector & task lifecycle exceptions

Contributors

This release was a huge community effort. We’ve welcomed contributions from 115 people (according to git shortlog). Big thanks to everyone who helped out!

Alex Glikson, Alex Loddengaard, Alexey Ozeritsky, Alexey Romanchuk, Andrea Cosentino, Andrew Otto, Andrey Neporada, Apurva Mehta, Arun Mahadevan, Ashish Singh, Avi Flax, Ben Stopford, Bharat Viswanadham, Bill Bejeck, Bryan Baugher, Chen Zhu, Christian Posta, Damian Guy, Dan Norwood, Dana Powers, David Chen, Derrick Or, Dong Lin, Dustin Cote, Edoardo Comar, Elias Levy, Eno Thereska, Eric Wasserman, Ewen Cheslack-Postava, Filipe Azevedo, Flavio Junqueira, Florian Hussonnois, Geoff Anderson, Grant Henke, Greg Fodor, Guozhang Wang, Gwen Shapira, Hans Deragon, Henry Cai, Ishita Mandhan, Ismael Juma, Jaikiran Pai, Jakub Dziworski, Jakub Pilimon, James Cheng, Jan Filipiak, Jason Gustafson, Jay Kreps, Jeff Klukas, Jendrik Poloczek, Jeyhun Karimov, Jiangjie Qin, Johnny Lim, Jonathan Bond, Jun Rao, Kaufman Ng, Kenji Yoshida, Konstantine Karantasis, Kota Uchida, Laurier Mantel, Liquan Pei, Luke Zaparaniuk, Magnus Reftel, Manikumar Reddy O, Manu Zhang, Mark Grover, Mathieu Fenniak, Matthias J. Sax, Maysam Yabandeh, Mayuresh Gharat, Michael G. Noll, Mickael Maison, Moritz Siuts, Nafer Sanabria, Nihed Bbarek, Onur Karaman, P. Thorpe, Peter Davis, Philippe Derome, Pierre Coquentin, Rajini Sivaram, Randall Hauch, Rekha Joshi, Roger Hoover, Rollulus, Ryan Pridgeon, Sahil Kharb, Samuel Taylor, Sasaki Toru, Satendra Kumar, Sebastien Launay, Shikhar Bhushan, Shuai Zhang, Som Sahu, Sriharsha Chintalapani, Sumit Arrawatia, Tao Xiao, Thanasis Katsadas, Tim Brooks, Todd Palino, Tom Crayford, Tom Rybak, Vahid Hashemian, Wan Wenli, William Thurston, William Yu, Xavier Léauté, Yang Wei, Yeva Byzek, Yukun Guo, Yuto Kawamura, Zack Dever, 1ambda, leisore, sven0726

Download

To download Apache Kafka 0.10.1.0, visit the download page. Also stay tuned for the upcoming 3.1.0 release of Confluent’s Enterprise version of Kafka. In addition to all the 0.10.1.0 features mentioned above, we offer the Confluent Control Center to monitor your Kafka cluster, and tools for multi-datacenter replication and auto data balancing. This offering is backed by our subscription support. We also offer expert training and technical consulting to help get your organization started. Register to get notifications about this release.

  • Jason Gustafson is a senior principal engineer on the Kafka team.

Did you like this blog post? Share it now