Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now
Publish/subscribe messaging, also known as pub/sub, is a messaging framework commonly used for asynchronous communication between services. It is represented by a logical component, often called a topic or routing key, that delivers the same message to multiple clients.
The counterpart to pub/sub messaging is point-to-point messaging, represented by a component called a messaging queue. A queue delivers each message to one client.
Both pub/sub messaging and messaging queues decouple the applications that send messages (producers from the ones that receive them (consumers). This arrangement allows for an asynchronous communication model: it removes or mitigates the delays and impedances associated with direct, synchronous communication between two applications.
Messaging systems decouple applications that produce messages from the ones that consume them. The role names “producer” and “consumer” themselves focus on an application’s immediate context rather than a static role in an architectural pattern (e.g., client-server). The emphasis shifts instead to the message as a time-based fact, or event.
The term publishing suggests an indefinite duration of a message’s usefulness when it is intended for an arbitrary number of consumers (subscribers). These subscribers, while interested in the same messages, all have their own use cases and life cycles. It often makes sense for the topic to persist messages in a way that allows subscribers to review the message store and leave room for later use cases.
Once a consumer or consumer group subscribes to a Kafka topic, it sets a position (offset) from which to read the events in the topic. Each message in a Kafka topic is an immutable record of some event. It can be deleted (e.g., when it expires the retention window of the topic), but it cannot be otherwise modified. Each record is structured as a key-value pair.
If the record key exists, then the producer uses it to determine which partition to send the record, so that every record of a given key will always be written to the same partition. Key-based partitioning and idempotent producers ensure per-partition ordering guarantees for all records with the same key. If the record key is null, the producer essentially sends the record to a random partition based on convenience.
The topic receives each record from a producer in the form of a serialized byte array. Consumers must know the proper deserialization method to decode the message contents. So the data format acts as an API between producers and consumers. Producers choose the schema and serialization format, and hopefully document that choice for consumers and make that schema available in a “schema registry” or searchable catalog.
Shifting these responsibilities – partitioning, serialization/deserialization, compression, etc – to producers and consumers lets a Kafka cluster focus on storage efficiency and optimal throughput, among other things, which in turn supports better performance relative to popular legacy messaging systems.
Both messaging types have been in use for decades, but some developments in this century have dramatically changed the IT landscape, including:
These trends put greater pressure on all modern middleware services, including messaging, to incorporate resilience, high availability, and scalability into their design. Because of this need for real-time data ingestion and integration, Apache Kafka has become the de facto choice for messaging.
Apache Kafka uses a cluster architecture, to which an operator can add multiple brokers (servers). As topics are added, they are distributed among the brokers in the form of one or more partitions. These partitions are usually replicated across the cluster so disruptions to cluster operations are transparent to producers and consumers. Partitions are what allow Apache Kafka to scale to much higher throughput compared to other pub/sub messaging systems while maintaining low latency. Each partition is an append-only commit log, and the Kafka API allows for many individual consumer instances to form a consumer group so that each consumer reads only a subset of the partitions.
Learn more about Kafka architecture and internals.
Partitions are what allow Apache Kafka to scale to much higher throughput compared to other pub/sub messaging systems while maintaining low latency. Each partition is an append-only commit log, and the Kafka API allows for many individual consumer instances to form a consumer group so that each consumer reads only a subset of the partitions.
Decoupling applications from each other, and allowing for future consumers to subscribe at any time, are primary motivations for using the pub/sub model. There are other benefits as well:
Pub/sub can vastly accelerate application development for new use cases. Messaging APIs make it simple for a developer to reason about and test how to issue or receive an event. There’s no need for a producer to work out the details of managing multiple consumers, and there’s no need for consumers to learn and navigate the API of a custom application. They simply use a topic’s schema as the API and start reading records.
Apache Kafka’s cluster model bakes reliability and availability into the design. Producers and consumers can expect a robust, properly-configured Kafka cluster to shield them from all but disastrous failures. Confluent extends this capability with a number of hybrid and multi-cloud solutions.
The pub/sub model is ideal for persisting messages indefinitely, which Kafka extends by reducing message storage requirements. (Confluent’s Schema Registry can make message storage more efficient by keeping the message schema information in a separate store.) Kafka’s support for auditing use cases . They also provide the basis for reconstructing the lineage of message transformations and the provenance of message sources.
Kafka’s pub/sub model only stores messages as byte-arrays. The producer chooses the serialization format, message schema, and message content. Consumers must know how to deserialize and interpret what producers send. The choice of programming language and tools belongs to the developer in either context.
You can alter various resources in a Kafka cluster – more brokers, new topics, changes in topic properties such as partition count – with little or no impact on its users. Compare this to a service application that might need a significant redesign, or even new hardware, to account for a big increase in user demand.
Pub/sub is ideal where connecting a variety of producers to a large base of consumers is central to the application’s design: online auctions, ride services, order fulfillment, and managing inventory over multiple warehouses are a few examples. Below are some specific use cases to further illustrate.
Confluent makes it easy to connect your apps, data systems, and entire organizations with real-time data flows and processing. To learn more about pub/sub and our cloud-native, complete, and fully-managed data streaming service, contact us . You can get started for free in minutes.