Kafka In the Cloud: Why It’s 10x Better With Confluent | Get free eBook

How to Use Standard SQL Over Kafka: From the Basics to Advanced Use Cases

Several different frameworks have been developed to draw data from Kafka and maintain standard SQL over continually changing data. This provides an easy way to query and transform data - now accessible by orders of magnitude more users.

At the same time, using Standard SQL against changing data is a new pattern for many engineers and analysts. While the language hasn’t changed, we’re still in the early stages of understanding the power of SQL over Kafka - and in some interesting ways, this new pattern introduces some exciting new idioms.

In this session, we’ll start with some basic use cases of how Standard SQL can be effectively used over events in Kafka- including how these SQL engines can help teams that are brand new to streaming data get started. From there, we’ll cover a series of more advanced functions and their implications, including:

  • WHERE clauses that contain time change the validity intervals of your data; you can programmatically introduce and retract records based on their payloads!
  • LATERAL joins turn streams of query arguments into query results; they will automatically share their query plans and resources!
  • GROUP BY aggregations can be applied to ever-growing data collections; reduce data that wouldn't even fit in a database in the first place.

We'll review in-production examples where each of these cases make unmodified Standard SQL, run and maintain over data streams in Kafka, and provide the functionality of bespoke stream processors.


Frank McSherry

Frank was previously at Microsoft Research Silicon Valley where he co-invented Differential Privacy, and subsequently led the Naiad project. Frank holds a Ph.D in Computer Science from the University of Washington.