Show Me How: Build Streaming Data Pipelines for Real-Time Data Warehousing | Register Today
The Apache Kafka community is pleased to announce the release of Apache Kafka 0.10.1.0. This is a feature release which includes the completion of 15 KIPs, over 200 bug fixes and improvements, and over 500 pull requests merged. A full list of the new features, improvements, and bug fixes can be found here.
Time-based Search: This release of Kafka adds support for a searchable index for each topic based off of message timestamps, which were added in 0.10.0.0. This allows for finer-grained log retention than was possible previously using only the timestamps from the log segments. It also enables consumer support for offset lookup by timestamp, which allows you to seek to a position in the topic based on a certain time.
Replication Quotas: Previously, data-intensive administrative operations (such as partition reassignment and new broker initialization) could affect client performance because of the stress on the cluster from unbounded replication. Replication quotas gives you a way to set an upper bound on the bandwidth used for replication so that the impact on clients is predictable and tunable.
Improved Log Compaction: Previously it was difficult for clients to build a consistent snapshot of a compacted topic since there was no way to tell when the consumer had reached the uncompacted portion of the log. Kafka now provides a configuration setting to control the time range that is eligible for log cleaning (leveraging the time-based index). As long as the consumer has read up to a recent point in the log, it is ensured to see a consistent view. Additionally, we have made it possible to enable both compaction and deletion on the same topic, which allows you to retain only data within a certain time window. We’ve also fixed a critical bug which causes the log cleaner to crash if the number of keys in a segment gets too large.
Interactive Queries: First introduced in our post introducing Kafka Streams and since emulated in various stream processing platforms, this release adds support for interactive queries, which allows you to treat the stream processing layer as a lightweight embedded database and to directly query the latest state of your stream processing application, without needing to materialize that state to external databases or external storage first. See here for more detail on how to use this feature.
Consumer Stabilization: The consumer now supports background heartbeating, which handles use cases with higher variance in message processing times more gracefully. Additionally, we now have a configuration option to override the maximum size of individual fetches to give you an easier way to tune memory usage. Due to these and other improvements, we’ve removed the beta label from the new consumer.
Improved memory management: Streams applications now benefit from record caches. Notably, these caches are used to compact output records (similar to Kafka’s log compaction) so that fewer updates for the same record key are being sent downstream. These new caches are enabled by default and typically result in reduced load on your streams application, your Kafka cluster, and/or downstream applications and systems such as external databases.
Secure Quotas: Kafka client quotas, first introduced in 0.9.0.0, uses the unauthenticated client.id configured by the user. Now secure quotas allows you to define quotas based off of the authenticated user principal, which gives your cluster much better protection in a secure environment.
KAFKA-4093: Cluster id
KAFKA-4298: LogCleaner writes inconsistent compressed message set if topic message format != message format
KAFKA-3894: LogCleaner thread crashes if not even one segment can fit in the offset map
KAFKA-3396: Unauthorized topics are returned to the user
KAFKA-4019: LogCleaner should grow read/write buffer to max message size for the topic
KAFKA-3916: Connection from controller to broker disconnects
KAFKA-4129: Processor throw exception when getting channel remote address after closing the channel
KAFKA-3680: Make Java client classloading more flexible
KAFKA-2948: Kafka producer does not cope well with topic deletions
KAFKA-3937: Kafka Clients Leak Native Memory For Longer Than Needed With Compressed Messages
KAFKA-3854: Subsequent regex subscription calls fail
KAFKA-4098: NetworkClient should not intercept all metadata requests on disconnect
KAFKA-3775: Throttle maximum number of tasks assigned to a single KafkaStreams
KAFKA-3776: Unify store and downstream caching in streams
KAFKA-3812: State store locking is incorrect
KAFKA-3938: Fix consumer session timeout issue in Kafka Streams
KAFKA-4153: Incorrect KStream-KStream join behavior with asymmetric time window
KAFKA-3845: Support per-connector converters
KAFKA-3846: Connect record types should include timestamps
KAFKA-2894: WorkerSinkTask doesn’t handle rewinding offsets on rebalance
KAFKA-3054: Connect Herder fail forever if sent a wrong connector config or task config
KAFKA-3850: WorkerSinkTask should retry commits if woken up during rebalance or shutdown
KAFKA-4042: DistributedHerder thread can die because of connector & task lifecycle exceptions
This release was a huge community effort. We’ve welcomed contributions from 115 people (according to git shortlog). Big thanks to everyone who helped out!
Alex Glikson, Alex Loddengaard, Alexey Ozeritsky, Alexey Romanchuk, Andrea Cosentino, Andrew Otto, Andrey Neporada, Apurva Mehta, Arun Mahadevan, Ashish Singh, Avi Flax, Ben Stopford, Bharat Viswanadham, Bill Bejeck, Bryan Baugher, Chen Zhu, Christian Posta, Damian Guy, Dan Norwood, Dana Powers, David Chen, Derrick Or, Dong Lin, Dustin Cote, Edoardo Comar, Elias Levy, Eno Thereska, Eric Wasserman, Ewen Cheslack-Postava, Filipe Azevedo, Flavio Junqueira, Florian Hussonnois, Geoff Anderson, Grant Henke, Greg Fodor, Guozhang Wang, Gwen Shapira, Hans Deragon, Henry Cai, Ishita Mandhan, Ismael Juma, Jaikiran Pai, Jakub Dziworski, Jakub Pilimon, James Cheng, Jan Filipiak, Jason Gustafson, Jay Kreps, Jeff Klukas, Jendrik Poloczek, Jeyhun Karimov, Jiangjie Qin, Johnny Lim, Jonathan Bond, Jun Rao, Kaufman Ng, Kenji Yoshida, Konstantine Karantasis, Kota Uchida, Laurier Mantel, Liquan Pei, Luke Zaparaniuk, Magnus Reftel, Manikumar Reddy O, Manu Zhang, Mark Grover, Mathieu Fenniak, Matthias J. Sax, Maysam Yabandeh, Mayuresh Gharat, Michael G. Noll, Mickael Maison, Moritz Siuts, Nafer Sanabria, Nihed Bbarek, Onur Karaman, P. Thorpe, Peter Davis, Philippe Derome, Pierre Coquentin, Rajini Sivaram, Randall Hauch, Rekha Joshi, Roger Hoover, Rollulus, Ryan Pridgeon, Sahil Kharb, Samuel Taylor, Sasaki Toru, Satendra Kumar, Sebastien Launay, Shikhar Bhushan, Shuai Zhang, Som Sahu, Sriharsha Chintalapani, Sumit Arrawatia, Tao Xiao, Thanasis Katsadas, Tim Brooks, Todd Palino, Tom Crayford, Tom Rybak, Vahid Hashemian, Wan Wenli, William Thurston, William Yu, Xavier Léauté, Yang Wei, Yeva Byzek, Yukun Guo, Yuto Kawamura, Zack Dever, 1ambda, leisore, sven0726
To download Apache Kafka 0.10.1.0, visit the download page. Also stay tuned for the upcoming 3.1.0 release of Confluent’s Enterprise version of Kafka. In addition to all the 0.10.1.0 features mentioned above, we offer the Confluent Control Center to monitor your Kafka cluster, and tools for multi-datacenter replication and auto data balancing. This offering is backed by our subscription support. We also offer expert training and technical consulting to help get your organization started. Register to get notifications about this release.
Get an introduction to why Python is becoming a popular language for developing Apache Kafka client applications. You will learn about several benefits that Kafka developers gain by using the Python language.
Discover tools, practices, and patterns for planning geo-replicated Apache Kafka deployments to build reliable, scalable, secure, and globally distributed data pipelines that meet your business needs.