Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink in Apache Flink

« Kafka Summit London 2022

Apache Kafka is one of the most commonly used connectors with Apache Flink for exactly-once streaming use cases. The combination of both systems allows you to build mission-critical systems that require low end-to-end latency and exactly-once processing eg. banks processing transactions. In Apache Flink 1.14, we released a new KafkaSink based on Apache Flink’s unified Sink interface that natively supports streaming and batch executions.

However, we needed to stretch Kafka’s transactions API to fully support exactly-once processing in Flink. In this talk, we will start with a quick recap of Apache Kafka’s transactions and Flink’s checkpointing mechanism. Then, we describe the two-phase commit protocol implemented in KafkaSink in-depth and emphasize the difficulties we have overcome when applying Kafka’s transaction API to longer-lasting transactions. We explain how we ensure performant writing to Apache Kafka and how the KafkaSink recovery works.

In summary, this talk should give users a deep dive into how Apache Flink leverages Apache Kafka’s transactions and developers an overview of what they have to consider when using Apache Kafka’s transactions.

Presentador

Fabian Paul

Confluent

Fabian Paul is a Senior Software Developer at Confluent and a Committer to the Apache Flink project. He is part of the team developing Confluent Platform for Apache Flink. Prior to joining Confluent, he worked at Databricks securing multiuser workloads on Apache Spark.

He also worked at Ververica, where he was responsible for redesigning Apache Flink's sink framework, to build sinks for modern data lakes, e.g., Delta Lake, Apache Iceberg.

Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink in Apache Flink

Presentador

Fabian Paul

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how