[Live Workshop] Streams on Tour: Hands-On Deep Dive into Confluent | Register Now

Presentation

Datalake Rock Paper Scissors: Iceberg + Flink or Iceberg + Spark?

« Current 2023

Do you have existing data pipelines for real-time data and want to add storage into the mix? Are you planning to use Apache Iceberg tables for storage, but are unsure whether to choose Apache Flink or Apache Spark to ingest the data from your Apache Kafka topics? While you have choices, how do you assess which technology is the right one for your use case?

At Bloomberg, Kafka and Iceberg are core elements in our real-time data pipelines and storage sinks. In this session, we'll share our experiences and lessons learned working with both technologies to ingest data from Kafka into our Iceberg datalake at near-real-time speeds. As we evaluate the pros and cons of Flink and Spark, we will compare and contrast the two approaches, specifically with regard to their functionality, performance, fault-tolerance, scaling, and resource utilization. We’ll also discuss how the bursty nature of Spark reads, the different parallelism approaches offered by Flink and Spark, and the small-file-problem may impact the overall performance of your data pipelines.

When we’re done, you’ll have a better understanding of how these technologies work and can make a more informed choice for your next datalake integration.

Presenter

Sitarama Chekuri

Bloomberg

I am currently a Senior Software Engineer at Bloomberg LP. My primary area of focus involves building real-time financial market data pipelines in the Derivatives Data Infrastructure Team using various open source technologies including Spark, Flink, Kafka, Iceberg and Trino.

Presenter

Ben de Vera

Bloomberg

I’m a software engineer primarily focused on data engineering. I have spent the past 2 years working at Bloomberg LP as an engineer part of the BVAL group.

Datalake Rock Paper Scissors: Iceberg + Flink or Iceberg + Spark?

Presenter

Sitarama Chekuri

Presenter

Ben de Vera

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how