[Webinar] How to Implement Data Contracts: A Shift Left to First-Class Data Products | Register Now

Jan 5, 2017Read Time: 3 min

Log Compaction: Highlights in the Apache Kafka and Stream Processing Community – January 2017

Written By

Gwen ShapiraEngineering Manager, Confluent

Jan 5, 2017Read Time: 3 min

Happy 2017! Wishing you a wonderful year full of fast and scalable data streams.

Many things have happened since we last shared the state of Apache Kafka^® and the streams ecosystem. Let’s take a look!

Most importantly – we did a bug fix release. Apache Kafka 0.10.1.1 fixes some critical issues found in the 0.10.1.0 release. There is a pretty substantial list of fixes, so if you are running 0.10.1.0, we recommend upgrading to avoid running into issues we already resolved.

Kafka Summit! If you haven’t heard – last year was so successful that we are doing two events this year. New York on May 8th and San Francisco on August 28th. Call for paper is ending soon, so please submit your talk proposals this week!

There are many KIPs (improvement proposals) being discussed in the Kafka developer list, many of them are huge improvements:

KIP-48 proposes adding delegation tokens to Kafka’s long list of authentication mechanisms. KIP-84 adds SASL-SCRAM mechanism as well.
KIP-66 adds single message transformations to Kafka Connect, which will allow light-weight processing of individual events as they are being streamed in and out of Kafka with the connectors. This is useful in cases where you want to remove a sensitive field from the records, add timestamps or UUID or route different events to different topics.
KIP-99 adds global tables to the Streams API in Kafka. This will allow loading small dimension tables, unpartitioned to the local cache of each Streams API node, which means you can now enrich a data stream with multiple dimensions without expensive re-partitioning for each join operation. This is similar to broadcast join when running parallel queries in traditional data warehouse.
KIP-101 proposes a modification to the message format in order to solve few known issues that can result in consistency issues between replicas in rare cases. Both the descriptions of the issues and the solution will be of interest to anyone who enjoys diving into distributed systems.
KIP-103 proposes new configuration that will allow separating traffic from internal and external clients. This will be useful for the many SREs who wanted to run internal traffic on a different network and for container and cloud deployments where there are different configuration and costs for internal and external traffic.

In addition to the many KIPs, there are some interesting releases, blogs and presentations I’d recommend checking out:

Apache Spark 2.1.0 was released. The highlight for the stream processing community is the addition of event-time watermarks to Spark Streaming.
It is a tradition to begin the new year with some predictions! For example, what do you think is the future of streams in financial services?
Apache Flink® published a review of the Flink community activities in the last year.
And Datanami reviewed 2016 for the entire Big Data industry.
This presentation has a nice overview of use-cases of streams, with more details than most.
A useful and funny presentation on the use of stream processing to handle billing in cloud environments.
And in case you were wondering why is managing data for microservices so challenging – Ben Stopford explains the Data Dichotomy.
Great discussion of why embedded DB is a must for stream processing – including a discussion of how this is done in Flink, Kafka Streams and Samza.

Gwen Shapira is a Software Enginner at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She currently specialises in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of books including “Kafka, the Definitive Guide”, and a frequent presenter at data related conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.

Did you like this blog post? Share it now

Introducing KIP-848: The Next Generation of the Consumer Rebalance Protocol

Jun 3, 2025

Big news! KIP-848, the next-gen Consumer Rebalance Protocol, is now available in Confluent Cloud! This is a major upgrade for your Kafka clusters, offering faster rebalances and improved stability. Our new blog post dives deep into how KIP-848 functions, making it easy to understand the benefits.

Jonathan Lacefield

How to Query Apache Kafka® Topics With Natural Language

May 29, 2025

The users who need access to data stored in Apache Kafka® topics aren’t always experts in technologies like Apache Flink® SQL. This blog shows how users can use natural language processing to have their plain-language questions translated into Flink queries with Confluent Cloud.

Rahul Bhattacharya

Log Compaction: Highlights in the Apache Kafka and Stream Processing Community – January 2017

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Written By

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Did you like this blog post? Share it now

Introducing KIP-848: The Next Generation of the Consumer Rebalance Protocol

How to Query Apache Kafka® Topics With Natural Language

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Did you like this blog post? Share it now

Subscribe to the Confluent blog

Introducing KIP-848: The Next Generation of the Consumer Rebalance Protocol

How to Query Apache Kafka® Topics With Natural Language