Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now

Log Compaction | Highlights in the Apache Kafka and Stream Processing Community | October 2016

Written By

This month the community has been focused on the upcoming release of Apache Kafka 0.10.1.0. Led by the fearless release manager, Jason Gustafson, we voted on a release plan, cut branches and started voting on the first release candidate. Please contribute to the community by downloading the release candidate, testing it out and letting everyone know how it went. If no serious bugs are found, we are hoping to finalize the release by mid-October.

In addition to the vote, we gave our website a quick facelift, contribution of Derrick Or. We appreciated the feedback from the community and issues were quickly addressed.

And as usual, there are several very lively discussions in the community:

  • KIP-74: Proposal to limit not just the amount of data returned by a consumer fetch per partition, but also the amount of data returned for each fetch request overall. This will give users better control over the memory usage of consumers, but even better – this allows consumers to make progress even if a partition contains messages larger than the maximum fetch size. This proposal has been merged and will be part of the 0.10.1.0 release.
  • KIP-79: Proposal to add methods for searching by timestamp to the new consumer was accepted and merged. It will be included in the next release to everyone’s great joy.
  • KIP-82: Proposal for adding headers to Kafka messages. This proposal is very popular because so many organizations are using headers internally. It is also controversial – Kafka project has a long tradition of keeping the message completely unstructured and letting the users and client put whatever structure they need inside the message. Whatever the decision is, it will have serious impact on the Apache Kafka ecosystem.
  • KIP-83: Much welcome proposal that allows to instantiate clients with different security configurations in the same JVM. There are already patches available by Rajini Sivaram and Edurdo Comar and once integrated it will allow us to update MirrorMaker to support different security configurations on source and target clusters.
  • KIP-85: Allowing clients to take JAAS configurations dynamically rather than via a file. This will be huge for those of us implementing microservices in containers – adding files to containers has been very inconvenient.

In addition to ongoing Kafka improvements, there are other interesting news and blogs:

If you are interested in learning all about streaming data platforms, Confluent has released a 6-part online talk series focusing on Apache Kafka. You can view the recordings for the first two talks in the series by Jay Kreps and Jun Rao, and register for the upcoming sessions at /apache-kafka-talk-series.

  • Gwen Shapira is a Software Enginner at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She currently specialises in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of books including “Kafka, the Definitive Guide”, and a frequent presenter at data related conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.

Did you like this blog post? Share it now