Register for Demo | Confluent Terraform Provider, Independent Network Lifecycle Management and more within our Q3’22 launch!

Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink in Apache Flink

Apache Kafka is one of the most commonly used connectors with Apache Flink for exactly-once streaming use cases. The combination of both systems allows you to build mission-critical systems that require low end-to-end latency and exactly-once processing eg. banks processing transactions. In Apache Flink 1.14, we released a new KafkaSink based on Apache Flink’s unified Sink interface that natively supports streaming and batch executions.

However, we needed to stretch Kafka’s transactions API to fully support exactly-once processing in Flink. In this talk, we will start with a quick recap of Apache Kafka’s transactions and Flink’s checkpointing mechanism. Then, we describe the two-phase commit protocol implemented in KafkaSink in-depth and emphasize the difficulties we have overcome when applying Kafka’s transaction API to longer-lasting transactions. We explain how we ensure performant writing to Apache Kafka and how the KafkaSink recovery works.

In summary, this talk should give users a deep dive into how Apache Flink leverages Apache Kafka’s transactions and developers an overview of what they have to consider when using Apache Kafka’s transactions.

Presenter

Fabian Paul

Fabian is a software engineer at Ververica. He is part of the SDK team that is responsible for developing all components of Apache Flink's ecosystem i.e. connector frameworks, APIs.