Apache Kafka 파티션 전체에서 데이터 균형 조정

« Current 2022

Apache Kafka is a distributed system. At the heart of Apache Kafka is a set of brokers, which allow to store the records persistently across different topics. Topics, in turn, are split into partitions. Dividing topics into such pieces allows us to use data from multiple partitions in parallel, so that producers and consumers can work with data simultaneously and achieve higher data throughput.

Such parallelisation is the key to a performant cluster, however it comes with a price. The thing is, reading from multiple partitions will eventually mess up the order of records, meaning that the resulting order will be different from when the data was pushed into the cluster.

This happens because when consuming data from multiple partitions, the order of partitions is not guaranteed. Instead, we must rely on the order of the records within a single partition, where the data is guaranteed to maintain the original sequence. We need to use this characteristic of Apache Kafka to our advantage in those cases where the ordering of the records is important for our system.

Therefore, when building our product architecture we should carefully weigh up how we will balance records across partitions and what mechanisms we will use to ensure that the sequence of the messages remains correct when data is read by multiple consumers.

In this talk we'll discuss mechanisms you can use to balance your data, such as keys, composite message key, role of hashing, custom partitions and other things you need to keep in mind when splitting data across partitions.

If you are fresh to Apache Kafka, or you're looking for good practices to design your topics and avoid common pitfalls, you'll find this session useful!

발표자

Olena Kutsenko

Aiven

Olena is a seasoned expert in data, sustainable software development, and teamwork. With a background in software engineering, she's led teams and developed mission-critical applications at Nokia, HERE Technologies, and AWS. Currently, she works at Aiven where she supports developers and customers in using open-source data technologies such as Apache Kafka, ClickHouse, and OpenSearch. She is also an international public speaker and regularly present at conferences around the world. She holds AWS Developer and Solutions Architect certifications.

Apache Kafka 파티션 전체에서 데이터 균형 조정

발표자

Olena Kutsenko

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how