์ค์๊ฐ ์์ง์ด๋ ๋ฐ์ดํฐ๊ฐ ๊ฐ์ ธ๋ค ์ค ๊ฐ์น, Data in Motion Tour์์ ํ์ธํ์ธ์!
Consumer Group IDs are a vital part of consumer configuration in Apache Kafkaยฎ. Setting the consumer Group ID determines what group a consumer belongs to, which has some major consequences. There are three areas in which Group IDs are particularly pertinent:
Detecting new data
Work sharing
Fault tolerance
Letโs dive in.ย
Kafka consumers read/consume data from Kafka producers, do the work of reading event streams. They read events, or messages, from logs called topics. Topics are further split into partitions, which are append-only logs that store the messages. This enables each topic to be hosted and replicated across a number of brokers.
As you can see in the diagram, a given consumer in a consumer group can read from multiple partitions, including multiple partitions housed in the same topic.ย
Group IDs are associated through the broker with bits of information called offsets, which specify the location of a given event within a partition, and as such, represent progress through the topic. Offsets in consumer groups serve the same purpose as how bookmarks or sticky tabs function in books. You can learn more about offsets in our FAQ.ย
You can use a particular Group IDโs offset to check whether thereโs been new data written to the partition.ย If thereโs an event with a larger offset, that means thereโs new data to read. If you want to know how to read the offset, hereโs a command using the kafka-consumer-groups
utility that will read your offsets:
Note that you need to provide a valid Group ID to --group
if youโre trying out this command. The output will resemble the following:
Or, if you want to learn more about how to do this with the Confluent CLI for a topic hosted in Confluent Cloud, you can check out this tutorial on reading from a specific offset and partition.ย
Thereโs more on the kafka-consumer-groups
utility in our documentation, and you can always run kafka-consumer-groups
โhelp for a full list of the options.ย
The Group ID determines which consumers belong to which group. You can assign Group IDs via configuration when you create the consumer client. If there are four consumers with the same Group ID assigned to the same topic, they will all share the work of reading from the same topic.ย ย
If there are eight partitions, each of those four consumers will be assigned two partitions. What if there are nine partitions? That means the leftover partition will be assigned to the first consumer in the group so that one consumer reads from three partitions and the rest of the consumers read from two partitions. Itโs the brokerโs job to continually ensure that partitions are evenly distributed among the connected consumers.
This whole process is predicated on the presence of a Group ID to unify the consumers. Itโs important to remember this while youโre setting up your consumers.ย
If youโre connecting microservices, you want to make sure that each service has its own consumer group (and hence its own Group ID). Why is that? Letโs walk through an example.
Letโs say thereโs a topic โpayments,โ and both the โordersโ microservice and the โrefundsโ microservice will need to read from that topic. You wouldnโt want them to share the same offsets, because if they did, the progress through the โpaymentsโ topic would be shared by โordersโ and โrefunds,โ which would mean potential missed orders or refunds.ย
However, if you had a group of consumers handling โordersโ by reading from partitions in the โpaymentsโ topic, then the current offset for each consumer in the group, stored in the broker, is vital to ensure continuous progress in case a consumer in the group crashes. At the same time, if consumers from another, separate group, like โrefundsโ are reading from the โpaymentsโ topic, they can continue their progress unaffected even if the consumers in the โordersโ group are rebalancing.ย
As the last example revealed, Group IDs also play a vital role in fault tolerance.ย
Each consumer groupโs broker sends โheartbeat requestsโ to the consumers at a set interval. If a consumer does not respond in time, a rebalance is triggered.ย
How does a Group ID play into rebalancing? Well, in either case, the brokerโs record of the associated offset determines where the consumer will begin reading after a rejoin. As long as the Group ID remains the same, it can pick up exactly where it left off, without any risk of data loss.ย
If youโre interested in learning more about rebalancing, we recommend the blog post Incremental Cooperative Rebalancing in Apache Kafka: Why Stop the World When You Can Change It?. You can also consult our FAQ.ย
In summary, when you set a consumer Group ID in the process of creating a consumer client, that Group ID assigns the consumer to its group, which has ramifications for work sharing, detecting new data, and data recovery. To learn more about this and other topics, check out these recommended resources:
Confluent Developer: Learn Apache Kafka through Confluent Developer tutorials, documentation, courses, blog posts, and examples.ย
Confluent Community: If you have a question about Apache Kafka or youโd like to meet other Kafka developers, head over to Confluent Community and introduce yourself on our Community Slack or Forum.ย
Streaming Audio Podcast: Listen to the Streaming Audio Podcast to hear lively conversations with Confluent users about the ins and outs of Apache Kafka. The episode Optimizing Kafkaโs Internals covers consumer group internals.ย
If youโve used Kafka for any amount of time youโve likely heard about connections; the most common place that they come up is in regard to clients. Sure, producer and consumer clients connect to the cluster to do their jobs, but it doesnโt stop there. Nearly all interactions across a cluster...
Apache Kafkaยฎ is an event streaming platform used by more than 30% of the Fortune 500 today. There are numerous features of Kafka that make it the de-facto standard for [โฆ]