[Webinar]  AI-Powered Innovation with Confluent & Microsoft Azure | Register Now

Presentation

Datalake Rock Paper Scissors: Iceberg + Flink or Iceberg + Spark?

« Current 2023

Do you have existing data pipelines for real-time data and want to add storage into the mix? Are you planning to use Apache Iceberg tables for storage, but are unsure whether to choose Apache Flink or Apache Spark to ingest the data from your Apache Kafka topics? While you have choices, how do you assess which technology is the right one for your use case?

At Bloomberg, Kafka and Iceberg are core elements in our real-time data pipelines and storage sinks. In this session, we'll share our experiences and lessons learned working with both technologies to ingest data from Kafka into our Iceberg datalake at near-real-time speeds. As we evaluate the pros and cons of Flink and Spark, we will compare and contrast the two approaches, specifically with regard to their functionality, performance, fault-tolerance, scaling, and resource utilization. We’ll also discuss how the bursty nature of Spark reads, the different parallelism approaches offered by Flink and Spark, and the small-file-problem may impact the overall performance of your data pipelines.

When we’re done, you’ll have a better understanding of how these technologies work and can make a more informed choice for your next datalake integration.

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how