Confluent
Getting Started with the Kafka Streams API using Confluent Docker Images
Stream Processing

Getting Started with the Kafka Streams API using Confluent Docker Images

Michael Noll

Introduction

What’s great about the Kafka Streams API is not just how fast your application can process data with it, but also how fast you can get up and running with your application in the first place—regardless of whether you are implementing your applications in Java or other JVM-based languages such as Scala and Clojure. Unlike competing technologies, Apache KafkaTM and its Streams API does not require installing a separate processing cluster, and it is equally viable for small, medium, large, and very large use cases.

In fact, it’s pretty common for our users to have their first application or proof-of-concept running in a matter of minutes. Some users, for example, opt to test-drive and develop their applications on their laptops against embedded, in-memory instances of Kafka and related services such as Confluent Schema Registry. And they also use the same setup for automated integration testing in CI environments backed by Jenkins or Travis CI. Our own GitHub repo containing the Confluent demo applications uses exactly such a setup.

Docker and Kafka Streams API: A Perfect Match

Many developers love container technologies such as Docker and the Confluent Docker images to speed up the iterative development they’re doing on their laptops: for example, to quickly spin up a containerized Confluent Open Source deployment consisting of multiple services such as Apache Kafka, Confluent Schema Registry, and Confluent REST Proxy for Kafka.

Additionally, Docker is also a very popular choice among Kafka users for containerizing and deploying applications and microservices on platforms such as Kubernetes or in the cloud. And yes, unlike related technologies such as Apache® Spark™ or Apache Flink®, where you must install and run special processing clusters into which you then submit cluster-specific “processing jobs,” you actually can containerize applications that use the Kafka Streams API because these are standard Java applications. (And as a side note, these applications are backwards and forwards compatible with Kafka cluster versions, making such deployment super-flexible to accommodate for independently working teams across a company.) This also means you are able to use the same organizational processes and technical tooling for development, testing, packaging, deployment, and monitoring of the Kafka Streams applications just like you do everywhere else inside your company. For example, if you don’t like containers but prefer deploying to VMs with Puppet or Ansible, no problem. If you do like containers and enjoy deploying to Kubernetes or a cloud service like AWS EC2, no problem either. And—speaking of containers and Docker—this brings us to the focus of this blog.

To get started with the Kafka Streams API, most users typically begin with our Confluent demo applications or the Kafka Streams API chapter in the Confluent documentation.  In order to make your getting started experience even better, we recently added a new Docker-based demo setup. This Docker-based demo is the focus of this blog post and, because the demo is a one-click experience, the remainder of this post will be quite short and concise!

Creating the Kafka Music Demo

We will run the Confluent Kafka Music demo application in a containerized, multi-service deployment, using Docker. If you are reading this blog post for the first time, this will take you about five minutes. Afterward, this will take just a few seconds!

Our Kafka Music application demonstrates how to build a music charts application that continuously computes, in real-time, the latest charts such as “Top 5 songs” per music genre. It exposes its latest Streams processing results—the latest music charts—through Kafka’s Interactive Queries feature (see our documentation on Interactive Queries) combined with a REST API. The application’s input data is in Avro format and comes from two sources: a stream of play events (think: “song X was just played”) and a stream of song metadata (“song X was written by artist Y”).  The corresponding Avro schemas are registered with the Confluent Schema Registry instance because that’s how one creates production-ready data streams.

We will run the following containerized services:

If you first want to see a preview of what we will do in the subsequent sections, take a look at the following screencast:

Screencast: Running Confluent Kafka Music demo application (3 mins)

Prerequisite

There is only one requirement to meet: you must install a recent version of Docker and Docker Compose on your host machine (e.g., your laptop running Mac OS, Linux, or Windows) if you haven’t done so already. If you are on a Mac, follow the instructions at Docker for Mac. The Confluent Docker images require Docker version 1.11 or greater.

For reference, I have run the instructions in this blog on a MacBook Pro with Mac OS Sierra and the following Docker versions:

Running the Kafka Music demo application

The first step is to clone the Confluent Docker Images repository:

Now we can launch the Kafka Music demo application including the services it depends on, such as Kafka:

After a few seconds, the application and the services are up and running. One of the started containers is continuously generating input data for the application by writing into the application’s input topics. This allows us to look at live, real-time data when using the Kafka Music application.

Now we can use our web browser or a CLI tool such as curl to interactively query the latest processing results of the Kafka Music application by accessing its REST API. In other words, we can play around now!

REST API example 1: list all running application instances of the Kafka Music application


REST API example 2: get the latest Top 5 songs across all music genres

The REST API exposed by the Kafka Music demo application supports further operations. See the top-level instructions in its source code for details (link points to the sources for Confluent 3.2).

If you’d like to continue exploring, perhaps by creating new Kafka topics or launching additional demonstrations, take a closer look at our Docker tutorial for Confluent 3.2.1.

Once you’re done you can stop all the services and containers with:

Conclusion and Wrapping Up

What’s great about what we have just done is not the actual Kafka Music example — rather, it’s that you can do the very same for your own applications! You can containerize your Kafka Streams application, similar to what we have done for the Kafka Music application above, and you can also deploy your application easily alongside other services such as an Apache Kafka cluster (with one or multiple brokers), Confluent Schema Registry, Confluent Control Center, and much more—including your own dockerized services. All you need is Docker and Confluent Docker images for Apache Kafka and friends. If you need an example or template for containerizing your Kafka Streams application, take a look at the source code of the Docker image we used for this blog post.

Lastly, the image for running the Kafka Music demo application actually contains all of Confluent Kafka Streams demo applications. This means you can easily run any of these applications, too. I won’t cover that in this blog post, but we have instructions for how to do so.

Next Steps

If you have enjoyed this tutorial, you might want to continue with the following resources to learn more or to write your own application that uses the Kafka Streams API:

Subscribe to the Confluent Blog

Subscribe
Email *

More Articles Like This

etl_mess
Yeva Byzek

Building a Real-Time Streaming ETL Pipeline in 20 Minutes

Yeva Byzek . .

There has been a lot of talk recently that traditional ETL is dead. In the traditional ETL paradigm, data warehouses were king, ETL jobs were batch-driven, everything talked to everything ...

streaming platform around apache kafka
Gwen Shapira

The Future of ETL Isn’t What It Used To Be

Gwen Shapira . .

In his book Design Patterns Explained, Alan Shalloway compares his car to an umbrella. After all, he uses both to stay dry in the rain. The umbrella has an advantage ...

kafka summit nyc
Frances Perry

Kafka Summit NYC is Almost Here – Don’t Miss the Streams Track!

Frances Perry . .

Ever wondered what it’s like to run Kafka in production? What about building and deploying microservices that process streams of data in real-time, at large scale?  Or, maybe just the ...

Leave a Reply

Your email address will not be published. Required fields are marked *

Comments

    1. Hello Daniel, unfortunately we don’t have a concrete ETA on Kafka Streams API support in librdkafka. Please stay tuned for more information!

Try Confluent Platform

Download Now