Kafka In the Cloud: Why It鈥檚 10x Better With Confluent | Get free eBook

What Are Apache Kafka Consumer Group IDs?

Watch demo: Kafka streaming in 10 minutes

Get started with Confluent, for free

Escrito por

Consumer Group IDs are a vital part of consumer configuration in Apache Kafka庐. Setting the consumer Group ID determines what group a consumer belongs to, which has some major consequences. There are three areas in which Group IDs are particularly pertinent:

  • Detecting new data

  • Work sharing

  • Fault tolerance

Let鈥檚 dive in.聽

What is a Kafka consumer?

Kafka consumers read/consume data from Kafka producers, do the work of reading event streams. They read events, or messages, from logs called topics. Topics are further split into partitions, which are append-only logs that store the messages. This enables each topic to be hosted and replicated across a number of brokers.

As you can see in the diagram, a given consumer in a consumer group can read from multiple partitions, including multiple partitions housed in the same topic.聽

Using consumer Group IDs to detect new data

Group IDs are associated through the broker with bits of information called offsets, which specify the location of a given event within a partition, and as such, represent progress through the topic. Offsets in consumer groups serve the same purpose as how bookmarks or sticky tabs function in books. You can learn more about offsets in our FAQ.聽

Checking for new data

You can use a particular Group ID鈥檚 offset to check whether there鈥檚 been new data written to the partition.聽 If there鈥檚 an event with a larger offset, that means there鈥檚 new data to read. If you want to know how to read the offset, here鈥檚 a command using the kafka-consumer-groups utility that will read your offsets:

kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group group1 --offsets

Note that you need to provide a valid Group ID to --group if you鈥檙e trying out this command. The output will resemble the following:

`GROUP   TOPIC  PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG     OWNER
Groupname topicname     0        2               2         1       ownername`

Or, if you want to learn more about how to do this with the Confluent CLI for a topic hosted in Confluent Cloud, you can check out this tutorial on reading from a specific offset and partition.聽

There鈥檚 more on the kafka-consumer-groups utility in our documentation, and you can always run kafka-consumer-groups鈥攈elp for a full list of the options.聽

Consumer Group IDs in work sharing

The Group ID determines which consumers belong to which group. You can assign Group IDs via configuration when you create the consumer client. If there are four consumers with the same Group ID assigned to the same topic, they will all share the work of reading from the same topic.聽聽

If there are eight partitions, each of those four consumers will be assigned two partitions. What if there are nine partitions? That means the leftover partition will be assigned to the first consumer in the group so that one consumer reads from three partitions and the rest of the consumers read from two partitions. It鈥檚 the broker鈥檚 job to continually ensure that partitions are evenly distributed among the connected consumers.

Note: At the top, you'll see that although there are four consumers, three are idle. That's because only one consumer in the same group can read from a single partition.

This whole process is predicated on the presence of a Group ID to unify the consumers. It鈥檚 important to remember this while you鈥檙e setting up your consumers.聽

If you鈥檙e connecting microservices, you want to make sure that each service has its own consumer group (and hence its own Group ID). Why is that? Let鈥檚 walk through an example.

Let鈥檚 say there鈥檚 a topic 鈥減ayments,鈥 and both the 鈥渙rders鈥 microservice and the 鈥渞efunds鈥 microservice will need to read from that topic. You wouldn鈥檛 want them to share the same offsets, because if they did, the progress through the 鈥減ayments鈥 topic would be shared by 鈥渙rders鈥 and 鈥渞efunds,鈥 which would mean potential missed orders or refunds.聽

However, if you had a group of consumers handling 鈥渙rders鈥 by reading from partitions in the 鈥減ayments鈥 topic, then the current offset for each consumer in the group, stored in the broker, is vital to ensure continuous progress in case a consumer in the group crashes. At the same time, if consumers from another, separate group, like 鈥渞efunds鈥 are reading from the 鈥減ayments鈥 topic, they can continue their progress unaffected even if the consumers in the 鈥渙rders鈥 group are rebalancing.聽

The role of consumer Group IDs in fault tolerance

As the last example revealed, Group IDs also play a vital role in fault tolerance.聽

What happens when a consumer crashes?聽

Each consumer group鈥檚 broker sends 鈥渉eartbeat requests鈥 to the consumers at a set interval. If a consumer does not respond in time, a rebalance is triggered.聽

How does a Group ID play into rebalancing? Well, in either case, the broker鈥檚 record of the associated offset determines where the consumer will begin reading after a rejoin. As long as the Group ID remains the same, it can pick up exactly where it left off, without any risk of data loss.聽

If you鈥檙e interested in learning more about rebalancing, we recommend the blog post Incremental Cooperative Rebalancing in Apache Kafka: Why Stop the World When You Can Change It?. You can also consult our FAQ.聽

Where to go from here

In summary, when you set a consumer Group ID in the process of creating a consumer client, that Group ID assigns the consumer to its group, which has ramifications for work sharing, detecting new data, and data recovery. To learn more about this and other topics, check out these recommended resources:

  • Confluent Developer: Learn Apache Kafka through Confluent Developer tutorials, documentation, courses, blog posts, and examples.聽

  • Confluent Community: If you have a question about Apache Kafka or you鈥檇 like to meet other Kafka developers, head over to Confluent Community and introduce yourself on our Community Slack or Forum.聽

  • Streaming Audio Podcast: Listen to the Streaming Audio Podcast to hear lively conversations with Confluent users about the ins and outs of Apache Kafka. The episode Optimizing Kafka鈥檚 Internals covers consumer group internals.聽

  • Lucia Cerchie is a developer advocate at Confluent. She believes in a human-centered developer experience, in the teaching of responsibility of developer advocates, and in the joy of learning.

Watch demo: Kafka streaming in 10 minutes

Get started with Confluent, for free

驴Te ha gustado esta publicaci贸n? Comp谩rtela ahora