Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now
We are very excited to announce the December release of KSQL, the streaming SQL engine for Apache Kafka®! As we announced in the November release blog, we are releasing KSQL on a monthly basis to make it even easier for you to get up and running with the latest and greatest functionality of KSQL to solve your own business problems.
The December release, KSQL 0.3, includes both new features that have been requested by our community as well as under-the-hood improvements for better robustness and resource utilization. If you have already been using KSQL, we encourage you to upgrade to this latest version to take advantage of the new functionality and improvements.
Since its initial release KSQL has already supported data in JSON and DELIMITED formats. In the past weeks we received many requests from our community to support additional data formats, and Avro has been by far the most requested. We are happy to announce that KSQL now supports data in Avro format through integration with the Confluent Schema Registry, which is part of the Confluent Platform. This means you can now run KSQL queries that read and write Avro data!
KSQL’s Avro support goes even beyond what many community members asked for. Instead of having to manually define Avro schemas, and then mapping those to KSQL’s columns and types in your DDL statements when creating a STREAM or a TABLE, KSQL does something even better: it automatically infers this information from a topic’s associated Avro schema in Confluent Schema Registry, so that you don’t have to deal with the hassle of figuring (and typing) this out yourself.
The following example shows how you can easily create a STREAM or TABLE from Kafka topics with Avro data. Note how you can completely omit the column definitions because KSQL will create the stream or table with the same columns as the fields in the corresponding Avro schema:
When working on Avro data, you can of course define columns and their types manually, which you may want to do if you are only interested in a subset of all the available fields in your Avro data.
Furthermore, you can now easily convert streams and tables (and, of course, their underlying topics) between Avro, JSON and delimited formats by writing a single line of KSQL. This functionality is great for real-time ETL use cases.
In the example above, the second line performs the JSON-to-Avro conversion:
Again, note how you didn’t need to specify any Avro schema in the data conversion example. As we described in the previous section, KSQL automatically manages schemas for you. This includes registering new Avro schemas automatically with Confluent Schema Registry when needed, and KSQL will of course adhere to any configured schema compatibility settings that you have defined.
Similarly, you can perform joins between streams and tables in KSQL regardless of the underlying data formats. There’s no special syntax needed; joining different data sources “just works” because KSQL’s internal data model translates automatically between the various data formats for you.
The following example joins a stream with Avro data and a table with JSON data:
Additionally, we have taken the first steps to provide metrics and observability in KSQL. This greatly enhances the operability of KSQL, like in cases where you’re monitoring KSQL capacity or when diagnosing issues. You can now see different metrics for streams, tables, and queries for every KSQL server instance.
For streams and tables, we now have DESCRIBE EXTENDED <stream/table name> statement to show statistics, such as number of messages processed per second, total messages, the time when the last message was received, as well as corresponding failure metrics.
DESCRIBE EXTENDED example:
For queries, we improved the EXPLAIN <query_id> statement to show both the query execution plan and the stream application’s topology for the query along with its message processing rate, total processed messages, the time when the last message was processed, as well as failure metrics such as serialization/deserialization errors.
EXPLAIN example:
While the 0.3 release has laid the groundwork for observability features in KSQL, we will be adding further functionality in the upcoming releases.
Lastly, KSQL servers start up faster now and have better resource utilization. For example, when a new server instance joins an existing pool of servers, or when a failed server recovers, it detects all terminated queries in its history and avoids starting and stopping processing topologies for those queries.
The year 2017 is drawing to a close, and we’d like to take this opportunity to give a shout-out to all of you who have contributed to KSQL thus far, be it in the form of feature requests, bug reports, asking or answering questions, code contributions, or participating in our KSQL beta program. If you, too, are interested in joining the beta, please reach out to us!
There’s much more on the horizon for KSQL in 2018, and we’re looking forward to collaborating with you to make KSQL the easiest and best tool to process data in Kafka.
If you have enjoyed this article, you might want to continue with the following resources to learn more about KSQL:
Tableflow can seamlessly make your Kafka operational data available to your AWS analytics ecosystem with minimal effort, leveraging the capabilities of Confluent Tableflow and Amazon SageMaker Lakehouse.
Building a headless data architecture requires us to identify the work we’re already doing deep inside our data analytics plane, and shift it to the left. Learn the specifics in this blog.