Build Services on a Backbone of Events
Apache Kafka

Build Services on a Backbone of Events

Ben Stopford

For many, microservices are built on a protocol of requests and responses. REST etc. This approach is very natural. It is after all the way we write programs: we make calls to other code modules, await a response and continue. It also fits closely with a lot of use cases we see each day: front facing websites where users hit buttons and expect things to happen.

Connected worldBut when we step into a world of many independent services, things start to change. As the number of services grows gradually over time, the web of synchronous interactions grows with them. Previously benign availability issues start to trigger far more widespread outages. 

Our unfortunate Ops Engineers end up donning deerstalkers, as they play out distributed murder mysteries. Frantically running from service to service, piecing together snippets of second hand information (who said what, to whom, and when?).

This is a well known problem and there are a number of solutions. One is to ensure your individual services have significantly higher SLAs than your system. Google provide a protocol for doing this. The alternative is to simply break down the synchronous ties which bind services together.

We can use asynchronicity as a mechanism for doing this. If you’re working in online retail you’ll find synchronous interfaces like getImage() or processOrder() feel natural. Calls which expect an immediate response. But when a user clicks ‘Buy’ they trigger a complex and asynchronous process. A process that takes a purchase and physically delivers it to the user’s door, way beyond the context of the original button-click. So splitting software into asynchronous flows allows us to compartmentalise the different problems we need to solve, as well as allowing us to embrace a world that is itself, inherently asynchronous.

In practice we tend to embrace this automatically. We’ve all found ourselves polling database tables for changes, or implementing some kind of scheduled cron job to churn through updates. These are simple ways to break the ties of synchronicity, but they always feel like a bit of a hack. There is a good reason for this. They probably are.

So we can condense all these issues into a single observation. The imperative programming model, where we command services to do our bidding, isn’t a great fit for estates where services are operated independently.

In this post we’re going to look at the other side of the architecture coin: composing services not through chains of commands, but rather through streams of events. This is a valid approach in its own right. It also forms a baseline for the more advanced patterns we’ll be discussing later in this series, where we blend the ideas of event-driven processing with those seen in Streaming Platforms.

Request driven way

Commands, Events and Queries

Before we dive into an example, we need to fix three simple concepts. There are three ways that services can interact with one another: Commands, Events and Queries. If you’ve not considered the distinction between these three before, it’s well worth doing so. 

Commands, event, query

The beauty of events is they are both fact and trigger. Data-on-the-outside that can be reused by any service in the system. But from a services perspective Events lead to less coupling than Commands and Queries. This fact is important.

The three mechanisms through which services interact:
  1. Commands are an action. A request for some operation to be performed in another service. Something that will change the state of the system. Commands expect a response.
  2. Events are both a Fact and a Trigger. Something that has happened, expressed as a notification.
  3. Queries are a request to look something up. Importantly, queries are side effect free; they leave the state of the system unchanged.


A Simple Event Driven Flow

Let’s start with a simple example: a customer ordering a widget. Two things happen next:

  1. The associated payment is processed.
  2. The system checks to see if more widgets need to be ordered.

In the request-driven approach this can be represented as a chain of commands. There are currently no queries. The interaction would look like this:

simple event driven flow

The first thing to note is the business process for “Buying more Stock” is initiated by (called by) the Orders Service. This mingles responsibilities across the two services. Ideally we’d have a better separation of concerns.

Now if we can represent the same flow using an Event Driven approach things work a little better.

  1. The UI Service raises OrderRequested and awaits OrderConfirmed (or Rejected) before returning to the user.
  2. Both the Orders Service and the Stock Service react to the raised event.

UI services

Looking at this closely, the Interaction between the UI service and the Orders Service hasn’t changed all that much, other than they communicate via events, rather than calling one another directly.

The Stock service is interesting though. No longer does the Orders Service tell it what to do. It controls whether it partakes in the interaction, or not. This is a very important property of this type of architecture, termed Receiver Driven Flow Control. Logic is pushed to the receiver of the events, rather than the sender. The burden of responsibility is flipped!

Flipping control to receivers reduces the coupling between services, which opens up an important level of pluggability to the architecture. Components can be swapped in and out with ease.

This element of pluggability becomes increasingly important as architectures become more complex. Say we were to add a service that manages pricing in real time, tweaking a product’s price based on supply and demand. In a command-driven world we would need to introduce a maybeUpdatePrice() method which is called by both the Stock Service and the Orders Service. But in the event-driven world re-pricing is just a service that subscribes to the shared stream, sending out price updates when relevant criteria are met.

UI service, orders service, pricing service, stock service

Blending Events and Queries

The above example considered only commands/events. There were no queries (remember we defined all interactions as being one of commands, events and queries earlier). Queries are a necessity for all but the simplest architecture. So let’s extend the example a bit, making the Orders Service check that there is sufficient Stock before it processes a Payment.

The request-driven approach to this would involve sending a query to the Stock service to retrieve the current stock count. This leads to a hybrid model, where the event stream is used purely for notification, allowing any service to dip into the flow, but queries go directly to source.

ui service, orders service, REST stock service

For larger ecosystems, where services need to evolve independently, remote queries add a lot of coupling, tying services together at runtime. We can avoid such cross-context queries by internalising them. The stream of events is used to cache datasets in each service, where they can be queried locally.

So to add this stock check, the Orders Service would subscribe to the stream of Stock events, storing them locally in a database. It would then query this ‘view’ to validate there is sufficient stock.

Orders Service would subscribe to the stream of Stock events

Pure event-driven systems have no concept of remote queries – events propagate state to services where it is queried locally

There are three advantages of this ‘Query by Event-Carried State Transfer’ approach:

  1. Better decoupling: Queries are local. They involve no cross-context calls. This ties services far less tightly than their request-driven brethren.
  2. Better autonomy: The Orders service has a private copy of the Stock dataset so it can do whatever it likes with it, rather than being limited to the query functionality offered by the Stock Service.
  3. Efficient Joins: If we were to “look up the stock” on every order, we would effectively be doing a join over the network between the two services. As workloads grow, or more sources need to be combined, this can be increasingly arduous. ‘Query by Event Carried State Transfer’ solves this issue by bringing queries (and joins) local.

This approach isn’t without its downsides though. Services become inherently stateful. They need to keep track of, and curate, the propagated data set over time. The duplication of state can also make some problems harder to reason about (how do we decrement the stock count atomically?) and we should be careful of divergence over time. But all these issues have workable solutions, they just need a little consideration.

The Single Writer Principle

A useful principle to apply with this style of system is to assign responsibility for propagating events of a specific type, to a single service: a single writer. So the Stock Service would own how the ‘Inventory of Stock’ progresses forward over time, the Orders Service would own Orders, etc.

This helps funnel consistency, validation and other ‘write path’ concerns through a single code path (although not necessarily a single process). So in the example below, notice that the Order Service takes control of every state change made to an Order, but the whole event flow spans Orders, Payments and Shipments, each managed by their respective services.

Assigning responsibility for event propagation is important because these aren’t just ephemeral events, or transient chit-chat. They represent shared facts, the data-on-the-outside. As such, services need to take responsibility for curating these shared datasets over time: fixing errors, handling situations where schemas change etc.

 a topic, in Kafka, for Orders, Shipments and Payments

Here each color represents a topic, in Kafka, for Orders, Shipments and Payments. The basket service starts the ball rolling. When a user clicks ‘buy’ it raises Order Requested, waiting for the Order Confirmed event before it responds back to the user. The three other services handle the state transitions that pertain to their section of the workflow. So for example, after the payment is processed, the Order service pushes the Order from Validated to Confirmed.

Blending Patterns and Clustering Services

For some the pattern described above will look like Enterprise Messaging, but it is subtly different. Enterprise Messaging, in practice, focuses on state transfer, effectively tying databases together across a network.

Event Collaboration is about services progressing some business goal, through a cascade of events, which trigger services into action. So it’s a pattern for business processing, rather than simply a mechanism for moving state.

But we typically want to leverage both “faces” of this pattern in the systems we build. In fact, one of the beauties of this pattern is it can handle both the micro and the macro, or be hybridized where it makes sense.

Combining patterns is also quite common. We might want the flexibility of remote queries rather than the overhead of maintaining a dataset locally, particularly as datasets grow. This makes it easier to deploy simple functions (which is important if we want to compose lightweight, serverless-styled, event flows), or because we’re in a stateless container or browser.

The trick is to limit the scope of these query interfaces, ideally to within a bounded context*. It’s typically better to have an architecture with many specific, targeted views rather than a single, shared datastore. (*A bounded context, here, is a set of services which share the same deployment cycle or domain model.)

To limit the scope of remote queries we can use a clustered context pattern. Here event flows are the sole communication pattern between contexts. But services within a context leverage both event-driven processing and request-driven views, where they are needed.

In the example below we have three divisions, which communicate with one another only through events. Inside each one we use finer grained event-driven flows. Some of these include a view tier (query layer). This balances coupling with convenience, allowing us to blend fine-grained services with the larger entities; the legacy applications or off the shelf products, which inhabit many real world service estates.

event tier & view tier

Clustered Context Model

The five key advantages of event-driven services:
  1. Decoupling: Break long chains of blocking commands. Decompose synchronous workflows. Brokers decouple services so it’s easier to plug new ones in, or change those that are there. 
  2. Offline/Async flows: When the user clicks a button many things happen. Some synchronously, some asynchronously. Design for the latter and the former falls out for free. 
  3. State Transfer: event become the dataset of your system. Streams provide an efficient mechanism for distributing datasets, so they can be reconstituted, and queried, inside a bounded context.
  4. Joins: It’s easier to combine/join/augment datasets from different services. Joins are fast and local.
  5. Traceability: It’s easier to debug the distributed ‘murder mystery’ when there’s a central, immutable, retentive narrative journaling each interaction as it unfolds in time.

Summing Up

So in the event driven approach we use Events instead of Commands. Events trigger processing. They are also turned into views we can query locally. We revert back to remote, synchronous queries where necessary, particularly in smaller ecosystems, but we limit their scope in larger ones (ideally to a single bounded context). 

But all these approaches are just patterns. Guiding principles for sewing systems together. We shouldn’t be too dogmatic about them. For example a global query service can still be a good idea, if it is something that rarely changes (like a Single Sign On service).

The trick is to start with a baseline of events. Events provide far less opportunity for services to couple themselves to one another, and flipping flow-control to the receiver makes for better separated concerns and better pluggability.

The other interesting thing about event-driven approaches is they work as well for large, complex architectures as they do for small, highly collaborative ones. A backbone of events gives services the autonomy they need to evolve freely. Missing the complex ties of commands and queries. As for the Ops engineers, they’ll still be wearing deerstalkers and playing murder mystery, but hopefully not quite as often, and at least now the story comes with a script!

But with all this talk of events, we’ve talked little of distributed logs or stream processing. When we apply this pattern with Kafka the rules of the system change. Retention in the broker becomes a tool we can design for, allowing us to embrace data-on-the-outside. Something services can refer back to.

Streaming Platforms sit very naturally with this model of processing events and building views. Views embedded directly inside a service, views a service queries remotely, or views materialised as an on-going stream.

This leads to a whole host of optimisations: leveraging the duality between event stream and event store, blending in stream processing tools that sift through the narrative, joining streams from many services and materializing views we can query. These patterns are empowering. They allow us to reimagine business processing using a toolset specifically designed for handling streams of events. But all these optimisations build on the concepts discussed here, simply applied with a more contemporary toolset.

In the next post, we will look at how we can use the various features Kafka offers to build scalable, highly-available event driven services.

Thanks to Antony Stubbs, Tim Berglund, Kaufman Ng, Gwen Shapira and Jay Kreps for their help reviewing this post.

Find out More:

Subscribe to the Confluent Blog

Email *

More Articles Like This

Ben Stopford

Using Apache Kafka as a Scalable, Event Driven Backbone for Service Architectures

Ben Stopford . .

The last post in this microservices series looked at building systems on a backbone of events, where events become both a trigger as well as a mechanism for distributing state. ...

Eno Thereska

Watermarks, Tables, Event Time, and the Dataflow Model

Eno Thereska . .

This post was co-written with Michael Noll, Product Manager, Confluent, and Matthias J. Sax, Engineer, Confluent. The Google Dataflow team has done a fantastic job in evangelizing their model of handling ...

Ben Stopford

The Data Dichotomy: Rethinking the Way We Treat Data and Services

Ben Stopford . .

Part 1 of the Event-Driven Services Series If you were to stumble upon the whole microservices thing, without any prior context, you’d be forgiven for thinking it a little strange. Taking ...

Leave a Reply

Your email address will not be published. Required fields are marked *


  1. Thanks for the thorough post. It’s interesting to be in this transition period between HTTP/REST and event-driven service communication. While I agree that event-based communication between services for fire-and-forget type communication is ideal, re-purposing event streams for query type communication is still not ideal, especially where systems connect to the wider web which is an ecosystem built around synchronous communication. Protocols like Websocket may change this, but the RPC protocols over Websocket are ad-hoc and widespread tooling (cURL) and standards (OAuth) are not available. Building local views for querying is often times not feasible, especially for more complex systems like search. Developers working on systems whose sole interprocess communication is via event streams have to build special tooling and systems to efficiently debug systems issues or use GUI tools to post/get messages.

    So I think event streams are a key part of a resilient systems architecture, I would resist seeing all microservice communication as the nail for that particular hammer.

    1. Hi Baq
      Thanks for leaving a comment. I certainly agree that repurposing event streams for queries is not a great idea something we’re suggesting here. Not in the sense of ‘synthesizing’ request-response over Kafka. But the event collaboration pattern provides a useful alternative nonetheless.
      Typically a user interface might communicate with its webserver or other services via rest or websockets or whatever feels most natural. But the webserver or other services would rely on views they create, if they use data from other backend services. These views would be in a database of some form, whatever one best suits the use case. This is a pretty common pattern in the wild.
      I would propose that having all your interaction go through Kafka makes it easier to debug issues. There is a journal to refer back to. As the tooling in this area becomes richer I think this will become an increasingly valuable asset, but there remains more work to be done.
      But what is important to note, and maybe I should have stressed more strongly here, is that this is pattern for a service architecture. Specifically services that are independently deployable, run by different teams etc. There are many use cases where a (distributed) monolith is a far better pattern to use. In such use cases its easier for services to share datastores, so the benefits of event based communication are to provide a trigger for async processing rather than also being a mechanism for moving state between services.

  2. Hi Ben
    thanks for great blog. I think, instead of building own local data with stock counts, the order service could listen to ItemsReserved event propagated by the stock service (after processing OrderConfirmed event). This would solve the atomic counter of items in the stock. But let say you have to send email notification to the user. This service would need some data from different servies (user info, email, ordered items with some details). Without local data, the email service would need to query other services or the event OrderApproved which triggers email notification would contain all necessary data (collected in the whole process). So, with local store we can not only decouple services, but also keep events really simple.


    1. Hi Tom

      Yes that should work. There are a few options that should. For the email service also, it’s a boonI I’ll clarify the post a bit, although I don’t want to delve into too much detail. Good comment though. Thanks.


  3. One problem that gets overlooked most is how important a typed versioned interface is between micro services; Protobuffer for messaging would be an answer here; also GRPC has async streams which is like the best of both words; Force fitting all scenarios into one solution domain looks counter productive. The idea could be to use both request-response as well as event driven, and with typed ,versioned interface

    1. Hi Alex

      Contracts are most certainly important. Particularly between teams/deployment-units, but also between individual services. I was in two minds about discussing the subject in this post. One consideration that is important I think is that it’s the schema of the async messages that is most important for this particular pattern. Why? Because it’s the default way for different teams/deployment-units to communicate. So we use both Async/Brokered and Request-Response protocols, but the scope of the latter is limited, Ideally to a single deployment unit. So if you have a UI talking to a couple of back end services in the same deployment unit, REST/Json is often enough. But you want something that provides a stronger contract, particularly evolvability features between teams/deployment-units.

      Regarding GRPC I see that as a slightly different beast. It’s the broker which really adds the value here from a decoupling perspective (so more than just relieving backpressure). In fairness the pattern described here (Event Collaboration) can be performed with any async technology, but the decoupling benefits I mention only fully apply to brokered, async tech. Later in the series we get into more Kafka specific patterns, which do a better job (I hope) of elucidating the value of a broker that has good scalability and retention properties.


  4. Hi Ben,

    When you talk about asynchronous Request/Reply over Kafka for example “OrderRequested” and “OrderConfirmed”. How do you implement that with Kafka? In other message brokers you have headers or temporal queues/topics to cover that …


    1. Hi Fernando

      Yes Kafka doesn’t provide message headers yet, although they are likely to be added soon (see below), so you’d deserialise the message. Filter functionality is provided by the Streams API. You can run this as a separate component but for a use case like this you’d just do the filtration in the client service, either with streams or roll your own. So the Orders service subscribes to the stream of orders, reacting only to OrderConfirmed.


  5. Thank Ben, another great article!
    The principles all resonate with me – the event-only model has deep consistency and symmetry.
    The killer features for me are:
    * complete system replay, with no loss of fidelity and no audit black-holes (“oops, I forgot to log that step …”)
    * naturally scalable

    One thought:
    The choice of toolkit really helps (e.g. async jax-rs is perhaps not best choice – but vert.x has been perfect for interfacing with Kafka).

    1. Hi Fuzz – yeah replay is super useful. Event sourcing for the data you share 🙂 There is a post on just that on its way in fact (it’s the next but one). Haven’t tried vert.x – will have to check it out.

  6. A red flag for me with ‘Query by Event Propagation’ seems to be duplicity of data with the service that is not the owner of the given data’s CRUD operations. Eventual consistency is not easy to maintain in a micro-service environment and also in a way goes to make the micro-service not ‘micro’ anymore if it has to maintain states of data owned by several other services. Probably making it a macro-service environment 🙂

    1. Hi Kunal.

      The eventual consistency issues are actually one key reason why the single writer principal is important. But at the same time, if you have a service that composes data from several others, it’s hard to perform the required join at runtime. This tends to push you down this route.

      In fact there are a number of issues with the “event propagation” approach in practice. Later in this series we’ll be covering some more advanced mechanisms for doing this as we blend in stream processing. This really changes things.

Try Confluent Platform

Download Now