KSQL is an open source, Apache 2.0 licensed streaming SQL engine that enables stream processing against Apache Kafka®.
KSQL makes it easy to read, write, and process streaming data in real-time, at scale, using SQL-like semantics. It offers an easy way to express stream processing transformations as an alternative to writing an application in a programming language such as Java or Python.
Currently available as a developer preview, KSQL provides powerful stream processing capabilities such as joins, aggregations, event-time windowing, and more!
Learn how to build real-time streaming applications with KSQL. This talk explains the KSQL engine architecture, and how to design and deploy interactive, continuous queries for streaming ETL and real-time analytics.Watch Video
Apache Kafka is a popular choice for powering data pipelines. KSQL makes it simple to transform data within the pipeline, readying messages to cleanly land in another system.
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
KSQL is a good fit for identifying patterns or anomalies on real-time data. By processing the stream as data arrives you can identify and properly surface out of the ordinary events with millisecond latency.
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
Kafka’s ability to provide scalable ordered messages with stream processing make it a common solution for log data monitoring and alerting. KSQL lends a familiar syntax for tracking, understanding, and managing alerts.
CREATE TABLE error_counts AS
SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;