Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now

Log Compaction | Highlights in the Apache Kafka and Stream Processing Community | September 2016

Written By

It is September and it’s evident that everyone is back from their summer vacation! We released Apache Kafka 0.10.0.1 which includes fixes of the bugs in the 0.10.0 release. In our last meeting we agreed to give time-based releases a try and immediately started planning Apache Kafka 0.10.1.0.

  • Confluent Platform 3.0.1 and Apache Kafka 0.10.0.1 were released. Lots of important bug fixes! If you are on Apache Kafka 0.10.0 or Confluent Platform 3.0.0, we recommend upgrading. If you are on an older release, please make sure you upgrade directly to the bugfix version.
  • We agreed to try Time-Based Release Plan. Aiming for 3 Apache Kafka releases a year (one every 4 months) and guaranteeing rolling upgrades for a duration of two years.
  • We started planning the next Apache Kafka release, which will have the version 0.10.1.0.. Much thanks to Jason Gustafson, Kafka’s newest committer for volunteering to drive the release. As usual, the community is encouraged to participate. Take a look at the release plan to learn how.
  • KIP-62 has been merged and will be included in Apache Kafka 0.10.1.0 and Confluent Plafrom 3.10. This KIP adds a background thread to the Kafka Consumer, allowing background heartbeats which will keep alive Consumers that stop polling. This should make it much easier to write consumers, especially consumers that need to process large amounts of data between iterations.
  • KIP-63, a proposal for improving caching in the Streams API in Kafka, was approved. This is a significant performance optimization that coalesces processing updates before sending them downstream, which reduces the load on Kafka clusters and on downstream external systems. It also paves the way for implementing new “trigger” behaviors.
  • KIP-71 was approved, allowing messages in topics to be both compacted and deleted. This will allow admins to impose disk constraints on compacted topics, by removing compacted keys which are older than the time limit or exceed disk space limits.
  • KIP-73 was approved, adding replication quotas or throttling to Apache Kafka. This feature is especially useful when reassigning replicas to brokers, allowing admins to limit the resources used by the reassignment process and therefore reducing the risk in reassignment. Replica reassignment has long been a difficult process in Apache Kafka, and we are excited about this improvement.
  • KIP-79, a proposal to evolve the Apache Kafka protocol to allow for requesting offsets according to timestamps (using the new timestamp indexes) is under active discussion. You are invited to take a look and share your feedback with the Kafka community.
  • Ben Stopford gave a very popular presentation on how Microservices and Apache Kafka fit together.
  • If you are curious to learn about the internals of the new Kafka Consumer Groups, you can watch this presentation from Kafka meetup at LinkedIn.
  • Want to learn how to choose a stream processing framework? Neha Narkhede and Stephan Ewen the Streams API in Kafka and Flink, providing good decision guidelines in the process.
  • Are Kafka Connect and Kafka Streams  ready for production? The Kafka community says yes! LINE Corp. explain how they are using Kafka Streams in large-scale production, and WePay talk about their use of Kafka Connect in large-scale production.
  • Grant Henke explains the architectural benefits of Apache Kafka for decoupling.
  • Confluent has updated the schedule of training classes for developers and operators of Kafka. Online courses are also available.
  • Gwen Shapira is a Software Enginner at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She currently specialises in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of books including “Kafka, the Definitive Guide”, and a frequent presenter at data related conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.

Did you like this blog post? Share it now