Live Demo: Build Scalable Event-Driven Microservices with Confluent | Register Now

Presentation

Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond

« Current 2023

In today's fast-paced global E-Commerce industry, the amount of data generated by online shoppers is massive. To deliver real-time analytics, effective advertising campaigns and machine learning based personalized recommendations are crucial. However, building a reliable and scalable data pipeline to support this is a challenging task.

In this talk, we'll share how we tackled the challenge of building a fully managed robust data pipeline using a combination of streaming analytics, batch processing, data lake, and machine learning. Our platform, built on Google Cloud Platform and powered by Confluent Kafka, enables us to process a massive volume of events per day.

We'll dive into the technical details of our architecture, tech stack, and data flow, including how we use

• Kafka Streams Java applications which are deployed in kubernetes to consume, deduplicate, transform, filter, and write data into HBase NoSQL database for real-time analytics,
• Push to Meta for advertising campaigns,
• Google AI for personalized recommendations,
• Confluent sink connector to push events to Google Cloud Storage and BigQuery, and ksqlDB for bot filtering.
• We'll also cover our observability, monitoring, and deployment practices.

But we don't want to just talk about our pipeline, we want to help you build one too. You'll leave our talk with practical insights and lessons learned from our experience, including tips on building a reliable, fault-tolerant, and scalable data pipeline, choosing the right tech stack, and ensuring end-to-end observability. Join us, and learn how to take your data pipeline to the next level.

Presenter

Mahendra Kumar

BigCommerce

Mahendra Kumar is the VP of Data and Software Engineering at BigCommerce where he is responsible for Data Analytics, Search, ML strategy, and execution. Mahendra has built real-time distributed data platforms of massive scale in multiple organizations. He is passionate about building great teams and great products!

Presenter

Aristatle Subramaniam

BigCommerce

Aristatle Subramaniam is a Lead Data Engineer at BigCommerce, where he brings his expertise in Kafka, and big data technologies to build data platforms and products. His expertise includes building streaming pipelines and using data to create machine learning models that drive business value. With over a decade of experience in the industry, he has worked with some of the world's top tech companies, including SAP, eBay, Intuit, and Visa. Aristatle has been using Kafka at production scale since version 0.8 and has developed a deep understanding of its capabilities and potential use cases.

Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond

Presenter

Mahendra Kumar

Presenter

Aristatle Subramaniam

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how