Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now
What’s great about the Kafka Streams API is not just how fast your application can process data with it, but also how fast you can get up and running with your application in the first place—regardless of whether you are implementing your applications in Java or other JVM-based languages such as Scala and Clojure. Unlike competing technologies, Apache Kafka® and its Streams API does not require installing a separate processing cluster, and it is equally viable for small, medium, large, and very large use cases.
In fact, it’s pretty common for our users to have their first application or proof-of-concept running in a matter of minutes. Some users, for example, opt to test-drive and develop their applications on their laptops against embedded, in-memory instances of Kafka and related services such as Confluent Schema Registry. And they also use the same setup for automated integration testing in CI environments backed by Jenkins or Travis CI. Our own GitHub repo containing the Confluent demo applications uses exactly such a setup.
Many developers love container technologies such as Docker and the Confluent Docker images to speed up the iterative development they’re doing on their laptops: for example, to quickly spin up a containerized Confluent Platform deployment consisting of multiple services such as Apache Kafka, Confluent Schema Registry, and Confluent REST Proxy for Kafka.
Additionally, Docker is also a very popular choice among Kafka users for containerizing and deploying applications and microservices on platforms such as Kubernetes or in the cloud. And yes, unlike related technologies such as Apache® Spark™ or Apache Flink®, where you must install and run special processing clusters into which you then submit cluster-specific “processing jobs,” you actually can containerize applications that use the Kafka Streams API because these are standard Java applications. (And as a side note, these applications are backwards and forwards compatible with Kafka cluster versions, making such deployment super-flexible to accommodate for independently working teams across a company.) This also means you are able to use the same organizational processes and technical tooling for development, testing, packaging, deployment, and monitoring of the Kafka Streams applications just like you do everywhere else inside your company. For example, if you don’t like containers but prefer deploying to VMs with Puppet or Ansible, no problem. If you do like containers and enjoy deploying to Kubernetes or a cloud service like AWS EC2, no problem either. And—speaking of containers and Docker—this brings us to the focus of this blog.
To get started with the Kafka Streams API, most users typically begin with our Confluent demo applications or the Kafka Streams API chapter in the Confluent documentation. In order to make your getting started experience even better, we recently added a new Docker-based demo setup. This Docker-based demo is the focus of this blog post and, because the demo is a one-click experience, the remainder of this post will be quite short and concise!
We will run the Confluent Kafka Music demo application in a containerized, multi-service deployment, using Docker. If you are reading this blog post for the first time, this will take you about five minutes. Afterward, this will take just a few seconds!
Our Kafka Music application demonstrates how to build a music charts application that continuously computes, in real-time, the latest charts such as “Top 5 songs” per music genre. It exposes its latest Streams processing results—the latest music charts—through Kafka’s Interactive Queries feature (see our documentation on Interactive Queries) combined with a REST API. The application’s input data is in Avro format and comes from two sources: a stream of play events (think: “song X was just played”) and a stream of song metadata (“song X was written by artist Y”). The corresponding Avro schemas are registered with the Confluent Schema Registry instance because that’s how one creates production-ready data streams.
We will run the following containerized services:
If you first want to see a preview of what we will do in the subsequent sections, take a look at the following screencast:
Screencast: Running Confluent Kafka Music demo application (3 mins)
There is only one requirement to meet: you must install a recent version of Docker and Docker Compose on your host machine (e.g., your laptop running Mac OS, Linux, or Windows) if you haven’t done so already. If you are on a Mac, follow the instructions at Docker for Mac. The Confluent Docker images require Docker version 1.11 or greater.
For reference, I have run the instructions in this blog on a MacBook Pro with Mac OS Sierra and the following Docker versions:
The first step is to clone the Confluent Docker Images repository:
Now we can launch the Kafka Music demo application including the services it depends on, such as Kafka:
After a few seconds, the application and the services are up and running. One of the started containers is continuously generating input data for the application by writing into the application’s input topics. This allows us to look at live, real-time data when using the Kafka Music application.
Now we can use our web browser or a CLI tool such as curl to interactively query the latest processing results of the Kafka Music application by accessing its REST API. In other words, we can play around now!
REST API example 1: list all running application instances of the Kafka Music application
REST API example 2: get the latest Top 5 songs across all music genres
The REST API exposed by the Kafka Music demo application supports further operations. See the top-level instructions in its source code for details (link points to the sources for Confluent 3.2).
If you’d like to continue exploring, perhaps by creating new Kafka topics or launching additional demonstrations, take a closer look at our Docker tutorial for Confluent 3.2.1.
Once you’re done you can stop all the services and containers with:
What’s great about what we have just done is not the actual Kafka Music example — rather, it’s that you can do the very same for your own applications! You can containerize your Kafka Streams application, similar to what we have done for the Kafka Music application above, and you can also deploy your application easily alongside other services such as an Apache Kafka cluster (with one or multiple brokers), Confluent Schema Registry, Confluent Control Center, and much more—including your own dockerized services. All you need is Docker and Confluent Docker images for Apache Kafka and friends. If you need an example or template for containerizing your Kafka Streams application, take a look at the source code of the Docker image we used for this blog post.
Lastly, the image for running the Kafka Music demo application actually contains all of Confluent Kafka Streams demo applications. This means you can easily run any of these applications, too. I won’t cover that in this blog post, but we have instructions for how to do so.
If you have enjoyed this article, you might want to continue with the following resources to learn more about Apache Kafka’s Streams API:
Tableflow can seamlessly make your Kafka operational data available to your AWS analytics ecosystem with minimal effort, leveraging the capabilities of Confluent Tableflow and Amazon SageMaker Lakehouse.
Building a headless data architecture requires us to identify the work we’re already doing deep inside our data analytics plane, and shift it to the left. Learn the specifics in this blog.