Kafka のコストを 25% 以上削減 | Confluent コスト削減チャレンジに参加
We are proud to announce the release of Apache Kafka® 3.3 on behalf of the Apache Kafka community. The 3.3 release contains many new features and improvements. This blog post will highlight some of the more prominent features. For a full list of changes, be sure to check the be sure to check the 3.3.0 and 3.3.1 release notes.
For several years, the Apache Kafka community has been developing a new way to run Apache Kafka with self-managed metadata. This new mode, called KRaft mode, improves partition scalability and resiliency while simplifying deployments of Apache Kafka. It also eliminates the need to run an Apache ZooKeeperTM cluster alongside every Apache Kafka cluster.
The 3.3 release now marks KRaft mode as production ready for new clusters only. There are some features that are currently supported by Apache ZooKeeper (ZK) mode that are not yet supported by KRaft mode. For more information on these features and proposed KRaft timelines, read KIP-833.
KIP-833 marks KRaft as production-ready for new clusters in the Apache Kafka 3.3 release. KIP-833 also marks 3.5.0 as the bridge release. The bridge release is the release that would allow the migration of Apache Kafka clusters from ZK mode to KRaft mode.
KIP-778 allows the upgrade of KRaft clusters without the need for the infamous “double roll”. In order to facilitate upgrades of Apache Kafka in KRaft mode, we need the ability to upgrade controllers and brokers while holding back the use of new RPCs and record formats until the whole cluster has been upgraded.
KIP-841 improves the topic partitions’ availability during clean shutdown. It does this by enforcing the following invariants: 1) a fenced or in-controlled-shutdown replica is not eligible to be in the ISR; and 2) a fenced or in-controlled-shutdown replica is not eligible to become leader.
KIP-836 exposes the DescribeQuorum API to the Admin client and adds two new fields per replica to the response. This information can be used to query the availability and lag of the cluster metadata for the controllers and brokers in a KRaft cluster.
With KRaft mode, Apache Kafka added a new controller quorum to the cluster. These controllers need to be able to commit records for Apache Kafka to be available. KIP-835 measures availability by periodically causing the high-watermark and the last committed offset to increase. Monitoring services can compare that these last committed offsets are advancing. They can also use these metrics to check that all of the brokers and controllers are relatively within each other’s offset.
With KRaft mode, the cluster metadata replicated log is the source of metadata related information for all servers in the cluster. Any errors that occur while processing this log could lead to the in-memory state for the server becoming inconsistent. It is important that such errors are made visible. KIP-859 exposes metrics that can be monitored so that affected servers can be discovered.
KIP-794 improves the default partitioner to distribute non-keyed data evenly in batches among healthy brokers and less data to unhealthy brokers. For example, the p99 latency for a producer workload with abnormal behavior was reduced from 11s to 154ms.
KIP-373 allows users to create delegation tokens for other users. This allows the following use cases: 1) a designated superuser can create tokens without requiring individual user credentials; and 2) a designated superuser can run kafka clients on behalf of another user.
Log recovery is a process triggered when a Kafka server starts up, if it had a previous unclean shutdown. It is used to make sure the log is in a good state and is not corrupted. KIP-831 exposes metrics to allow users to monitor the progress of log recovery.
KIP-709 streamlines the process of fetching offsets from consumer groups so that a single request can be made to fetch offsets for multiple groups. This carries the following advantages: 1) it reduces request overhead; and 2) it simplifies client side code.
KIP-827 exposes an RPC for querying the disk total size and disk usage size per log directory. This is useful for tools that are interested in querying this information without relying on the exposed metrics.
KIP-851 adds the option in the Admin client for querying the committed offsets when using exactly once semantics.
addMetricIfAbsentmethod to Metrics
KIP-843 allows the metrics API to atomically query a metric if present or create a metric if absent.
KIP-824 allows the kafka-dump-logs tool to be configured to only scan and print the records at the start of the log segment instead of the entire log segment.
With the metrics available today in the plain consumer it is possible for users of Kafka Streams to derive the consumed throughput of their applications at the subtopology level, but the same is not true for the produced throughput.
KIP-846 fills in this gap and gives end users a way to compute the production rate of each subtopology by introducing two new metrics for the throughput at sink nodes. Even though it is possible to derive the consumed throughput with existing client level metrics, KIP-846 also adds two new metrics for the throughput at source nodes to provide an equally fine-grained metrics scope as for the newly added metrics at the sink nodes and to simplify the user experience.
KIP-834 adds the ability to pause and resume topologies. This can be used to reduce resources used or modify data pipelines. Paused topologies skip processing, punctuation, and standby tasks. For distributed Streams applications, each instance will need to be paused and resumed separately.
KIP-820 generalizes the KStream API to consolidate
Transformers (which could forward results) and
Processors (which could not). The change makes use of the new type-safe Processor API. This simplifies Kafka Streams, making it easier to use and learn.
KafkaStreams.close()API that forces the member to leave the consumer group
KIP-812 can efficiently close the stream permanently by forcing the member to leave the consumer group.
KIP-618 adds exactly one semantic support to source connectors. The Connect framework was expanded to atomically write source records and their source offsets to Apache Kafka, and to prevent zombie tasks from producing data to Apache Kafka.
In addition to all of the KIPs listed above, Apache Kafka 3.3 is packed with fixes and improvements. To learn more:
This was a community effort, so thank you to everyone who contributed to this release, including all our users and our 116 authors:
Akhilesh C, Akhilesh Chaganti, Alan Sheinberg, Aleksandr Sorokoumov, Alex Sorokoumov, Alok Nikhil, Alyssa Huang, Aman Singh, Amir M. Saeid, Anastasia Vela, András Csáki, Andrew Borley, Andrew Dean, andymg3, Aneesh Garg, Artem Livshits, A. Sophie Blee-Goldman, Bill Bejeck, Bounkong Khamphousone, bozhao12, Bruno Cadonna, Chase Thomas, chern, Chris Egerton, Christo Lolov, Christopher L. Shannon, CHUN-HAO TANG, Clara Fang, Clay Johnson, Colin Patrick McCabe, David Arthur, David Jacot, David Mao, Dejan Maric, dengziming, Derek Troy-West, Divij Vaidya, Edoardo Comar, Edwin, Eugene Tolbakov, Federico Valeri, Guozhang Wang, Hao Li, Hongten, Idan Kamara, Ismael Juma, Jacklee, James Hughes, Jason Gustafson, JK-Wang, jnewhouse, Joel Hamill, John Roesler, Jorge Esteban Quilcate Otoya, José Armando García Sancio, jparag, Justine Olshan, K8sCat, Kirk True, Konstantine Karantasis, Kvicii, Lee Dongjin, Levani Kokhreidze, Liam Clarke-Hutchinson, Lucas Bradstreet, Lucas Wang, Luke Chen, Manikumar Reddy, Marco Aurelio Lotz, Matthew de Detrich, Matthias J. Sax, Mickael Maison, Mike Lothian, Mike Tobola, Milind Mantri, nicolasguyomar, Niket, Niket Goel, Nikolay, Okada Haruki, Philip Nee, Prashanth Joseph Babu, Rajani Karuturi, Rajini Sivaram, Randall Hauch, Richard Joerger, Rittika Adhikari, RivenSun, Rohan, Ron Dagostino, ruanliang, runom, Sanjana Kaundinya, Sayantanu Dey, SC, sciclon2, Shawn, sunshujie1990, Thomas Cooper, Tim Patterson, Tom Bentley, Tom Kaszuba, Tomonari Yamashita, vamossagar12, Viktor Somogyi-Vass, Walker Carlson, Xavier Léauté, Xiaobing Fang, Xiaoyue Xue, xjin-Confluent, xuexiaoyue, Yang Yu, Yash Mayya, Yu, YU, yun-yun
This post was originally published by Jose Armando Garcia Sancio on The Apache Software Foundation blog.
Companies are looking to optimize cloud and tech spend, and being incredibly thoughtful about which priorities get assigned precious engineering and operations resources. “Build vs. Buy” is being taken seriously again. And if we’re honest, this probably makes sense. There is a lot to optimize.
Operating Kafka at scale can consume your cloud spend and engineering time. And operating everyday tasks like scaling or deploying new clusters can be complex and require dedicated engineers. This post focuses on how Confluent Cloud is 1) Resource Efficient, 2) Fully Managed, and 3) Complete.