Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now

What is Pub/Sub?

Publish/subscribe messaging, also known as pub/sub, is a messaging framework commonly used for asynchronous communication between services. It is represented by a logical component, often called a topic or routing key, that delivers the same message to multiple clients.

The counterpart to pub/sub messaging is point-to-point messaging, represented by a component called a messaging queue. A queue delivers each message to one client.

How Pub/Sub Works

Sending and Receiving Messages

Both pub/sub messaging and messaging queues decouple the applications that send messages (producers from the ones that receive them (consumers). This arrangement allows for an asynchronous communication model: it removes or mitigates the delays and impedances associated with direct, synchronous communication between two applications.

Messaging Architecture

Messaging systems decouple applications that produce messages from the ones that consume them. The role names “producer” and “consumer” themselves focus on an application’s immediate context rather than a static role in an architectural pattern (e.g., client-server). The emphasis shifts instead to the message as a time-based fact, or event.

The term publishing suggests an indefinite duration of a message’s usefulness when it is intended for an arbitrary number of consumers (subscribers). These subscribers, while interested in the same messages, all have their own use cases and life cycles. It often makes sense for the topic to persist messages in a way that allows subscribers to review the message store and leave room for later use cases.

Once a consumer or consumer group subscribes to a Kafka topic, it sets a position (offset) from which to read the events in the topic. Each message in a Kafka topic is an immutable record of some event. It can be deleted (e.g., when it expires the retention window of the topic), but it cannot be otherwise modified. Each record is structured as a key-value pair.

If the record key exists, then the producer uses it to determine which partition to send the record, so that every record of a given key will always be written to the same partition. Key-based partitioning and idempotent producers ensure per-partition ordering guarantees for all records with the same key. If the record key is null, the producer essentially sends the record to a random partition based on convenience.

The topic receives each record from a producer in the form of a serialized byte array. Consumers must know the proper deserialization method to decode the message contents. So the data format acts as an API between producers and consumers. Producers choose the schema and serialization format, and hopefully document that choice for consumers and make that schema available in a “schema registry” or searchable catalog.

Shifting these responsibilities – partitioning, serialization/deserialization, compression, etc – to producers and consumers lets a Kafka cluster focus on storage efficiency and optimal throughput, among other things, which in turn supports better performance relative to popular legacy messaging systems.

History of Messaging

Messaging and Middleware - Then and Now

Both messaging types have been in use for decades, but some developments in this century have dramatically changed the IT landscape, including:

  • The rise of cloud-based, distributed, virtualized computing platforms
  • Lightweight synchronous communication models, such as REST APIs
  • More and different user agents and types (browsers, web/mobile apps, other managed services).

These trends put greater pressure on all modern middleware services, including messaging, to incorporate resilience, high availability, and scalability into their design. Because of this need for real-time data ingestion and integration, Apache Kafka has become the de facto choice for messaging.

Messaging with Apache Kafka

Modern Messaging with Apache Kafka

Apache Kafka uses a cluster architecture, to which an operator can add multiple brokers (servers). As topics are added, they are distributed among the brokers in the form of one or more partitions. These partitions are usually replicated across the cluster so disruptions to cluster operations are transparent to producers and consumers. Partitions are what allow Apache Kafka to scale to much higher throughput compared to other pub/sub messaging systems while maintaining low latency. Each partition is an append-only commit log, and the Kafka API allows for many individual consumer instances to form a consumer group so that each consumer reads only a subset of the partitions.

Learn more about Kafka architecture and internals.

Partitions are what allow Apache Kafka to scale to much higher throughput compared to other pub/sub messaging systems while maintaining low latency. Each partition is an append-only commit log, and the Kafka API allows for many individual consumer instances to form a consumer group so that each consumer reads only a subset of the partitions.

Benefits Pub/Sub Messaging with Kafka

Decoupling applications from each other, and allowing for future consumers to subscribe at any time, are primary motivations for using the pub/sub model. There are other benefits as well:

Faster Application Development

Pub/sub can vastly accelerate application development for new use cases. Messaging APIs make it simple for a developer to reason about and test how to issue or receive an event. There’s no need for a producer to work out the details of managing multiple consumers, and there’s no need for consumers to learn and navigate the API of a custom application. They simply use a topic’s schema as the API and start reading records.

Reliability and Availability

Apache Kafka’s cluster model bakes reliability and availability into the design. Producers and consumers can expect a robust, properly-configured Kafka cluster to shield them from all but disastrous failures. Confluent extends this capability with a number of hybrid and multi-cloud solutions.

Auditability

The pub/sub model is ideal for persisting messages indefinitely, which Kafka extends by reducing message storage requirements. (Confluent’s Schema Registry can make message storage more efficient by keeping the message schema information in a separate store.) Kafka’s support for auditing use cases . They also provide the basis for reconstructing the lineage of message transformations and the provenance of message sources.

Support for multiple clients

Kafka’s pub/sub model only stores messages as byte-arrays. The producer chooses the serialization format, message schema, and message content. Consumers must know how to deserialize and interpret what producers send. The choice of programming language and tools belongs to the developer in either context.

Scalable and Elastic

You can alter various resources in a Kafka cluster – more brokers, new topics, changes in topic properties such as partition count – with little or no impact on its users. Compare this to a service application that might need a significant redesign, or even new hardware, to account for a big increase in user demand.

Pub/Sub Use Cases

Pub/sub is ideal where connecting a variety of producers to a large base of consumers is central to the application’s design: online auctions, ride services, order fulfillment, and managing inventory over multiple warehouses are a few examples. Below are some specific use cases to further illustrate.

Ordering a Meal Delivery

  • When a meal for delivery is ordered on a mobile app, the customer “publishes” their order and delivery details. The specified delivery service is a subscriber that receives orders that are keyed for them. The customer then becomes a subscriber to events detailing the status of their order.
  • The roles of message producer and consumer change with the state of the order. The application ensures that messages are routed correctly and promptly.

Event Ticketing

  • Event ticketing is an interesting case for supporting dynamic updates. Say a customer wants to review a venue’s seating chart before making a selection. The application could update seat availability from time to time to keep browsing customers apprised of availability. Some customers choose a seat right away but abandon their cart. Once the timeout window expires, that seat could appear again for browning customers.

Fraud Detection

  • Banks and credit card companies use pub/sub models to aggregate, correlate, and inspect transactions for potential fraud. If two transactions use the same card but occur in different states within a few minutes of each other, it’s a likely sign something is wrong. The pub/sub model could be used to alert the bank’s fraud agency and the affected customer as soon as practical.

Internet of Things (IoT)

  • IoT devices are a natural fit with pub/sub messaging. Device data, such as location, and state changes, coupled with the data they capture, provide a rich field of data points useful to a broad number of consumer use cases from data cleaning, transforming, and enriching, all the way to visualizing the data and training inference models with it.
  • One concrete example is remote patient monitoring. A hospital can track patient data in each room and pass it to the closest nursing station, the attending physician, and more. Patients recovering at home can use wearable devices to keep caregivers apprised of their condition.

Getting Started with Pub/Sub

Confluent makes it easy to connect your apps, data systems, and entire organizations with real-time data flows and processing. To learn more about pub/sub and our cloud-native, complete, and fully-managed data streaming service, contact us . You can get started for free in minutes.