KSQL is an open source, Apache 2.0 licensed streaming SQL engine that enables stream processing against Apache KafkaTM.
KSQL makes it easy to read, write, and process streaming data in real-time, at scale, using SQL-like semantics. It offers an easy way to express stream processing transformations as an alternative to writing an application in a programming language such as Java or Python.
Currently available as a developer preview, KSQL provides powerful stream processing capabilities such as joins, aggregations, event-time windowing, and more!
Learn how to build real-time streaming applications with KSQL. This talk explains the KSQL engine architecture, and how to design and deploy interactive, continuous queries for streaming ETL and real-time analytics.Watch Video
Apache Kafka is a popular choice for powering data pipelines. KSQL makes it simple to transform data within the pipeline, readying messages to cleanly land in another system.
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
KSQL is a good fit for identifying patterns or anomalies on real-time data. By processing the stream as data arrives you can identify and properly surface out of the ordinary events with millisecond latency.
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
Kafka’s ability to provide scalable ordered messages with stream processing make it a common solution for log data monitoring and alerting. KSQL lends a familiar syntax for tracking, understanding, and managing alerts.
CREATE TABLE error_counts AS
SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;