Kafka のコストを 25% 以上削減 | Confluent コスト削減チャレンジに参加
It’s official: Apache Kafka® 2.3 has been released! Here is a selection of some of the most interesting and important features we added in the new release.
In order to keep your data safe, Kafka creates several replicas of it on different brokers. Kafka will not allow writes to proceed unless the partition has a minimum number of in-sync replicas. This is called the “minimum ISR.”
Kafka already had metrics showing the partitions that had fewer than the minimum number of in-sync replicas. In this release, KIP-427 adds additional metrics showing partitions that have exactly the minimum number of in-sync replicas. By monitoring these metrics, users can see partitions that are on the verge of becoming under-replicated.
Additionally, KIP-351 adds the –under-min-isr command line flag to the kafka-topics command. This allows users to easily see which topics have fewer than the minimum number of in-sync replicas.
To a first-order approximation, previous values of a key in a compacted topic get compacted some time after the latest key is written. Only the most recent value is available, and previous values are not. However, it has always been possible to set the minimum amount of time a key would stick around before being compacted, so we don’t lose the old value too quickly. Now, with KIP-354, it’s possible to set the maximum amount of time an old value will stick around. The new parameter max.log.compaction.time.ms specifies how long an old value may possibly live in a compacted topic. This can be used in complying with data retention regulations such as the GDPR.
Previously, Kafka would prioritize opening new TCP connections over handling existing connections. This could cause problems if clients attempted to create many new connections within a short time period. KIP-402 prioritizes existing connections over new ones, which improves the broker’s resilience to connection storms. The KIP also adds a max.connections per broker setting.
In order to keep replicas up to date, each broker maintains a pool of replica fetcher threads. Each thread in the pool is responsible for fetching replicas for some number of follower partitions. Previously, if one of those partitions failed, the whole thread would fail with it, causing under-replication on potentially hundreds of partitions. With this KIP, if a single partition managed by a given replica fetcher thread fails, the thread continues handling the remainder of its partitions.
When the broker starts up after an unclean shutdown, it checks the logs to make sure they have not been corrupted. This JIRA optimizes that process so that Kafka only checks log segments that haven’t been explicitly flushed to disk. Now, the time required for log recovery is no longer proportional to the number of logs. Instead, it is proportional to the number of unflushed log segments. Some of the benchmarks which Zhanxiang Huang discusses on the JIRA show up to a 50% reduction in broker startup time.
In Kafka Connect, worker tasks are distributed among the available worker nodes. When a connector is reconfigured or a new connector is deployed—as well as when a worker is added or removed—the tasks must be rebalanced across the Connect cluster. This helps ensure that all of the worker nodes are doing a fair share of the Connect work. In 2.2 and earlier, a Connect rebalance caused all worker threads to pause while the rebalance proceeded. As of KIP-415, rebalancing is no longer a stop-the-world affair, making configuration changes a more pleasant thing.
For a deep dive on Incremental Cooperative Rebalancing, see this blog post by Konstantine Karantasis.
A running Connect cluster contains several different thread pools. Each of these threads emits its own logging, as one might expect. However, this makes it difficult to untangle the sequence of events involved in a single logical operation, since the parts of that operation are running asynchronously in their various threads across the cluster. This KIP adds some context to each Connect log message, making it much easier to make sense of the state of a single connector over time.
Take a look at Robin Moffatt’s blog post for more details about Kafka Connect improvements in Apache Kafka 2.3.
Prior to this KIP, message timestamps were not stored in the Streams state store. Only the key and value were there. With this KIP, timestamps are now included in the state store. This KIP lays the groundwork to enable future features like handling out-of-order messages in KTables and implementing TTLs for KTables.
These KIPs add in-memory implementations for the Kafka Streams window store and session store. Previously, the only component with an in-memory implementation was the state store. The in-memory implementations provide higher performance, in exchange for lack of persistence to disk. In many cases, this can be a very good tradeoff.
The first half of this KIP, the flatTransform() method, was delivered in Kafka 2.2. The flatTransform() method is very similar to flatMap(), in that it takes a single input record and produces one or more output records. flatMap() does this in a type-safe way but without access to the ProcessorContext and the state store. We’ve been able to use the Processor API to perform this same kind of operation with access to the ProcessorContext and the state store, but without the type safety of flatMap(). flatTransform() gave us the best of both worlds: processor API access, plus compile-time type checking.
flatTransformValues(), just introduced in the completed KIP-313 in Kafka 2.3, is to flatTransform() as flatMapValues() is to flatMap(). It lets us do processor-API-aware computations that return multiple records for each input record without changing the message key and causing a repartition.
Thanks for reading! Check out the release notes for more details about all the great stuff in Kafka 2.3. Also, check out this YouTube video on the highlights of the release!
This post was originally published by Colin McCabe on The Apache Software Foundation blog.
Companies are looking to optimize cloud and tech spend, and being incredibly thoughtful about which priorities get assigned precious engineering and operations resources. “Build vs. Buy” is being taken seriously again. And if we’re honest, this probably makes sense. There is a lot to optimize.
Operating Kafka at scale can consume your cloud spend and engineering time. And operating everyday tasks like scaling or deploying new clusters can be complex and require dedicated engineers. This post focuses on how Confluent Cloud is 1) Resource Efficient, 2) Fully Managed, and 3) Complete.