Confluent OEM Program: Grow faster with enterprise-grade data streaming | Learn More

Log Compaction – Highlights in the Apache Kafka® and Stream Processing Community – March 2017

Written By

Big news this month! First and foremost, Confluent Platform 3.2.0 with Apache Kafka® 0.10.2.0 was released! Read about the new features, check out all 200 bug fixes and performance improvements and then download Confluent Platform 3.2.0 and try it out.

Thanks to Ismael Juma, there is already a plan for the next release of Apache Kafka – 0.11.0.0, so you can check out the features planned for June. The big ticket items are exactly-once and transactions,  dropping support for Java 7, and disabling unclean leader election by default.

Notable KIPs this month include:

Voted:

  • KIP-107: Add purgeDataBefore() API in AdminClient – This KIP allows developers to request data purging from Kafka. This data cleanup is in addition to the usual cleanup policy which is time-based and size-based. The cleanup API is especially useful for multi-step stream processing jobs that can now remove intermediate data after it was processed by downstream jobs.
  • KIP-119: Drop Support for Scala 2.10 in Kafka 0.11 – We’ve added support for Scala 2.12 in Kafka 0.10.2.0, now it is time to remove the older version of Scala.
  • KIP-121: Add KStream peek method – A new stream DSL command. Similar to map(), but intended to produce side-effects rather than modify the events in the stream. This is useful for debugging and diagnostics: peek() can be used to update a monitoring metric or to print the current record, similar to Java 8’s Stream#peek() method.

Discussed:

  • KIP-129: Streams Exactly-Once Semantics – Now that adding exactly-once semantics and transactions to Kafka is in progress, it is time to add exactly-once processing semantics to Kafka’s Streams API.
  • KIP-122: Add Reset Consumer Group Offsets tooling – Ever had a consumer group fail on a bad record and wished you could just tell the consumer group to skip ahead a bit? So did we. Now we are discussing the best CLI to do it.
  • KIP-124 – Request rate quotas – Right now Kafka allows limiting the bandwidth that a client is allowed to produce and consume, but there is still no control over how much CPU resources a client is using. The functionality will be very useful for anyone running a multi-tenant cluster, and the discussion on how to best model CPU consumption of clients and the best ways to let administrators control it via a configuration is fascinating.
  • KIP-125: ZookeeperConsumerConnector to KafkaConsumer Migration and Rollback – We want to deprecate the old 0.8.x consumer in favor of the new consumer, but some teams have trouble migrating because there is no support for a rolling upgrade between the two consumer types. This KIP proposes a solution to this problem, allowing us to remove the old consumer.

Notable Blog posts:

Our Confluent Community Slack Channel is thriving – with 500 members and lively discussions on Apache Kafka and all ecosystem projects. The community is still new, but next month we’ll share highlights from the community discussions. You are invited to join.

And most important, we announced the agenda for Kafka Summit NYC  and a Kafka Summit hackathon. We look forward to seeing all of you there! Register now!

  • Gwen Shapira is a Software Enginner at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She currently specialises in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of books including “Kafka, the Definitive Guide”, and a frequent presenter at data related conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.

Did you like this blog post? Share it now