Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now

Log Compaction | Highlights in the Apache Kafka and Stream Processing Community | August 2016

Written By

It is August already, and this marks exactly one year of monthly “Log Compaction” blog posts – summarizing news from the very active Apache Kafka and stream processing community. Hope you enjoy them and as usual, let us know if you have news to share.

  • The Apache Kafka community is preparing to release a bugfix for version 0.10.0. The new release will be 0.10.0.1 and we are currently voting on a release candidate – hopefully we won’t find critical issues and the release will be available soon.
  • The on-going work on KIP-4 has seen significant progress. This work will allow all client libraries to manage topics without depending on core Kafka or Zookeeper:
  • API to create new topics through the wire protocol was voted in and committed
  • API to delete topics was voted in and a patch is currently under review
  • API to manage ACLs is currently under discussion
  • KIP-67 – adding queryable state to Kafka Streams was voted in and committed. This new feature will allow other applications to directly query the latest processing results of your Kafka Streams application (i.e. its current state).  This means that, for many use cases, you no longer need to operate and interface with external systems or databases to share data between applications. The result is a simplified, more app-centric architecture.
  • Michael Noll published two more blogs on Kafka Streams: Secure stream processing and Elastic Scaling in Kafka Streams.
  • Alex Loddengaard published his best practices for running Apache Kafka in AWS. There have been tons of questions in the community about this topic as cloud deployments are becoming more and more popular – so we shared our answers in this blog post.
  • Spark 2.0 was released last week with many improvements to Spark Streaming. This blog post gives an overview of what’s new in Spark Streamng.
  • Back in April, when we ran the Kafka Connect and Streams hackathon, one of my favorite projects was by SVDS. They streamed data from a Bluetooth brain monitoring device to Kafka, used Kafka Connect to stream data out to OpenTSDB, and then used Grafana to visualize the brain activity! How cool is that? SVDS blogged all the fun details of their brain monitoring project for your inspiration.
  • Recommendation powerhouse Yelp blogged about their real-time data pipeline architecture – and it is gorgeous. We recommend checking it out as a reference for anyone tasked with building similar infrastructure.
  • Apache Kafka training is offered by Confluent and our partners. New classes have just been published, including online-based training. /training.
  • Gwen Shapira is a Software Enginner at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She currently specialises in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of books including “Kafka, the Definitive Guide”, and a frequent presenter at data related conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.

Did you like this blog post? Share it now