Nouveau dans Confluent Cloud : rendre les données et les pipelines accessibles pour un streaming prêt pour l’IA | En savoir plus

S'identifier Contacter l'équipe de vente

Feb 8, 2017Temps de lecture: 3 min

Log Compaction – Highlights in the Apache Kafka and Stream Processing Community – February 2017

Écrit par

Gwen ShapiraEngineering Manager, Confluent

Feb 8, 2017Temps de lecture: 3 min

As always, we bring you news, updates and recommended content from the hectic world of Apache Kafka^® and stream processing.

Sometimes it seems that in Apache Kafka every improvement is preceded by an involved KIP process. This month we’ve merged a great patch that improved the 99% latency of Kafka without requiring user visible changes: https://issues.apache.org/jira/browse/KAFKA-4614. Not only does it make a fast system even faster, the JIRA itself is worthy of study. I wish all JIRAs included this level of research.

Some important improvements do require KIPs. Here is what we’ve seen in active discussions this month:

KIP-112: Handle disk failure for JBOD and its close relative KIP-113: Support replicas movement between log directories. Both these KIPs improve Kafka’s behavior in the common case where the broker’s data is written to a number of directly mounted disks on the broker server (rather than using RAID). With these improvements, Kafka will be able to survive failure of a single disk without taking down an entire broker, and it will allow admins to control the placement of replicas on disk – useful in cases where disks or replicas have uneven sizes.
KIP-117: Add a public AdminClient API for Kafka admin operations: This lets developers create, modify and delete topics and ACLs without using internal APIs which are subject to incompatible changes and without requiring ZooKeeper connection from the applications.
KIP-98: The famous KIP that adds transactional semantics and exactly-once to Kafka is now under voting. This means that the Wiki now contains all the public changes. If you haven’t read it yet, now is a good time.
KIP-118 suggests we remove support for Java 7 in the next major release (0.11). We don’t know yet when 0.11 will get released, but we know it will be later than June.
KIP-110 suggests adding support for a new compression codec: ZStandard Compression. The new compression, written by Facebook, looks very promising.
KIP-109 suggests marking the old consumers as deprecated, as a hint for developers that they should start migrating to the new clients. As the KIP states, the old consumers are missing important features like security that were only added in the new clients.

Notable Blogs and Presentations:

One of the basic design patterns of Microservices is creating a local cache or materialized view. Keeping the cache updated can be a challenge. Zach Cox explains the challenges in maintaining a local cache for a service and provides several solutions using different Kafka APIs.
Plumbr used Kafka to transition from a monolith to microservices as they scaled their architecture.
Sky Betting & Gaming published their Kafka-centric streaming architecture.
And since everyone loves benchmarks: Comparing the different compression codecs in Apache Kafka.
Trulia talks about how they use Kafka to drive a machine learning system, which they use to offer personalized experiences in mobile and desktop.

Gwen Shapira is a Software Enginner at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. She currently specialises in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an author of books including “Kafka, the Definitive Guide”, and a frequent presenter at data related conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.