This blog post is the second in a series about Kafka Streams, the new stream processing library of the Apache Kafka project, which was introduced in Kafka v0.10.
Current blog posts in the Kafka Streams series:
- Elastic Scaling in Kafka Streams
- Secure Stream Processing with Kafka Streams (this post)
- Data Reprocessing with Kafka Streams: Resetting a Streams Application
In this post we describe the security features of Kafka Streams. Many use cases and applications — whether it is in the area of stream processing or elsewhere — have tight internal and/or external security requirements. Industries where such requirements are common include finance and healthcare in the private sector but also include governmental services. Legal compliance, for example, may require you to implement certain security measures such as encrypting data-in-transit when you are working on sensitive data such as personally identifiable information. You may also be required to enforce authentication and authorization to limit access to data to only a subset of employees or personnel to adhere to information security policies such as Need-to-know.
The question we want to answer in this blog post is:
- Question (secure stream processing): If you are working in sensitive and regulated environments, what are your security options with regards to stream processing, notably when you are using a. Apache Kafka as the foundation of your data infrastructure and b. Kafka Streams as the means to process the data in Apache Kafka?
Let’s start with a quick answer to this question:
- Answer (secure stream processing): Apache Kafka ships with a range of security features including but not limited to client authentication, client authorization, and data encryption. These security features help you to implement your security policies and to protect your valuable data against internal and external threats. On the side of processing the data in Kafka, your best option is to use Kafka Streams to build your stream processing applications because Kafka Streams integrates natively with Kafka’s security features.
We can now walk through this answer in further detail.
First, which security features are available in Apache Kafka, and thus in Kafka Streams? Kafka Streams supports all the client-side security features in Apache Kafka. In this short blog post we cannot cover these client-side security features in full detail, so I recommend reading the Kafka Security chapter in the Confluent Platform documentation and our previous blog post Apache Kafka Security 101 to familiarize yourself with the security features that are currently available in Apache Kafka.
That said, let me highlight a couple of important Kafka security features that are essential for implementing robust data infrastructures, whether these are used for building horizontal services at larger companies, for multi-tenant infrastructures (e.g. microservices), or for shared platforms such as in the Internet of Things. Later on I will then demonstrate an example application where we use some of these security features in Kafka Streams.
Kafka security features include:
- Encrypting data-in-transit between the servers of a Kafka cluster: You can enable the encryption of broker-to-broker communication. Brokers communicate with each other, for example, to replicate data for fault-tolerance.
- Encrypting data-in-transit between Kafka servers and Kafka clients: You can enable the encryption of the client-server communication between the Kafka servers/brokers and Kafka clients. Kafka clients include stream processing applications built using the Kafka Streams library.
- Example: You can configure your Kafka Streams applications to always use encryption when reading data from Kafka and when writing data to Kafka; this is very important when reading/writing data across security domains (e.g. internal network vs. public Internet or partner network).
- Client authentication: You can enable client authentication for connections from Kafka clients (including Kafka Streams) to Kafka brokers/servers.
- Example: You can define that only some specific Kafka Streams applications are allowed to connect to your production Kafka cluster.
- Client authorization: You can enable client authorization of read/write operations by Kafka clients.
- Example: You can define that only some specific Kafka Streams application are allowed to read from a Kafka topic that stores sensitive data. Similarly, you can restrict write access to certain Kafka topics to only a few stream processing applications to prevent e.g. data pollution or fraudulent activities.
It’s worth noting that the aforementioned security features in Apache Kafka are optional, and it is up to you to decide whether to enable or disable any of them. And you can mix and match these security features as needed: both secured and non-secured Kafka clusters are supported, as well as a mix of authenticated, unauthenticated, encrypted and non-encrypted clients. This flexibility allows you to model the security functionality in Kafka to match your specific needs, and to make effective cost vs. benefit (read: security vs. convenience/agility) tradeoffs: tighter security requirements in places where security matters (e.g. production), and relaxed requirements in other situations (e.g. development, testing).
Second, how do you use these security features in Kafka Streams, i.e. when building your own stream processing applications? The most important aspect to understand is that Kafka Streams leverages the standard Kafka producer and consumer clients behind the scenes. Hence what you need to do to secure your stream processing applications is to configure the appropriate security settings of the corresponding Kafka producer/consumer clients. Once you know which client-side security features you want to use, you simply need to include the corresponding settings in the configuration of your Kafka Streams application.
Let’s show a simple example, based our previous blog post Apache Kafka Security 101. What we want to do is to configure our Kafka Streams application to 1. encrypt data-in-transit when communicating with its target Kafka cluster and 2. enable client authentication.
For the sake of brevity, we assume that a. the security setup of the Kafka brokers in the cluster is already completed and b. the necessary SSL certificates are available to your Kafka Streams application in the filesystem locations specified below (the aforementioned blog post walks you through the steps to generate them); for example, if you are using Docker to containerize your Kafka Streams applications, then you must also include these SSL certificates in the right locations within the Docker image.
Once these two assumptions are met, you must only configure the corresponding settings for the Kafka clients in your Kafka Streams application. The configuration snippet below shows the settings to enable client authentication and enable SSL encryption for data-in-transit between your Kafka Streams application and the Kafka cluster it is reading from and writing to:
Within a Kafka Streams application, you’d use code such as the following to configure these settings in your
With these settings in place your Kafka Streams application will encrypt any data-in-transit that is being read from or written to Kafka, and it will also authenticate itself against the Kafka brokers that it is talking to. (Note that this simple example does not cover client authorization.)
Now what would happen if you misconfigured the security settings in your Kafka Streams application? In this case, the application would fail at runtime, right after you started it. For example, if you entered an incorrect password for the
ssl.keystore.password setting, then the following error messages would be logged, and after that the application would terminate:
Similar exceptions would be thrown if you misconfigured other security settings such as
Your Operations team can monitor the log files of your Kafka Streams applications for such error messages to spot any misconfigured applications quickly, and to alert the corresponding teams.
In summary, Kafka Streams makes your stream processing applications secure, and it achieves this through its native integration with Apache Kafka’s security functionality.
If you are interested in further information on Kafka Streams and security features, I’d recommend the following references:
- Kafka Streams documentation (Confluent Platform 3.0.0), notably the Kafka Security chapter
- Apache Kafka Security 101, Confluent blog post, February 2016
- Demo application SecureKafkaStreamsExample under confluentinc/examples
- Video talk: Introduction to Kafka Streams (slides), June 2016
And if you want to get started implementing your own Kafka Streams applications, you may want to:
- Download Confluent Platform 3.0.0, which includes Apache Kafka 0.10 with Kafka Streams
- Read the Kafka Streams demo applications under confluentinc/examples and apache/kafka
- Join our bi-weekly Kafka Streams Ask-Me-Anything sessions where you can chat with our Kafka Streams engineering team. Contact us to receive an invite!