[Webinar] How to Implement Data Contracts: A Shift Left to First-Class Data Products | Register Now

Presentation

Don’t Forget About Your Past—Optimizing Apache Druid Performance

« Current 2022

Businesses need to react to results immediately; to achieve this, real-time processing is becoming a requirement in many analytic verticals. But sometimes, the move from batch to real-time can leave you in a pinch. How do you handle and correct mistakes in your data? How do you migrate a new system to real-time along with historical data?

Let’s start with how to run Apache Druid locally with your containerized-based development environment. While streaming real-time events from Kafka into Druid, an S3 Complaint Store captures messages via Kafka Connect, for historical processing. An exploration of performance implications when the real-time stream of events contains historical data and how that affects performance and the techniques to prevent those issues, leaving a high-performance analytic platform supporting real-time and historical processing.

You’ll leave with the tools of doing real-time analytic processing and historical batch processing from a single source of truth. Your Druid cluster will have better rollups (pre-computed aggregates) and fewer segments, which reduces cost and improves query performance.

Presenter

Neil Buesing

Kinetic Edge

Neil brings 25+ years of experience with midmarket and enterprise companies, tackling complex data and IT projects. More recently, Neil has become an avid fan and expert of all things Apache Kafka and the Kafka ecosystem.

Don’t Forget About Your Past—Optimizing Apache Druid Performance

Presenter

Neil Buesing

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how