Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now
There are numerous messaging systems out there with use cases for message queuing, distributed messaging, and high-performance event streaming systems. Here we'll do a deep side-by-side comparison of Apache Kafka®, Apache Pulsar®, and RabbitMQ®—performance, architecture, features, and other differences to help you choose the best open source messaging system.
Apache Kafka is an open source distributed event streaming platform. Based on the abstraction of a distributed commit log, Kafka is capable of handling trillions of events a day with functionality comprising pub/sub, permanent storage, and the processing of event streams. The de facto transport for event streaming use cases Kafka is used by thousands of organizations, from internet giants to car manufacturers to stock exchanges and has more than 5 million lifetime downloads. Kafka is also available as managed service offerings on all major cloud platforms via Confluent Cloud and others.
Apache Pulsar is an open-source distributed messaging system. Originally developed as a queuing system, it has been broadened in recent releases to add event streaming features. Pulsar makes use of Apache BookKeeper™ for its storage layer—a project created at Yahoo as a high-availability solution to Hadoop's HDFS NameNode (although not ultimately used for that use case). It shares properties with both Kafka and RabbitMQ. Pulsar is a largely community-led project with no enterprise-grade commercial backing today.
RabbitMQ (AMQP) is an open-source traditional message-oriented middleware that implements the AMQP messaging standard. Its capabilities include queuing, exchanges, routing, and low-latency messaging. Written in Erlang, RabbitMQ is developed and commercially supported by Pivotal Software, part of VMware.
Kafka provides the highest throughput of all systems, writing 15x faster than RabbitMQ and 2x faster than Pulsar, based on the popular OpenMessaging Benchmark*
*Full results described in the associated: benchmarking comparison
Kafka provides the lowest latency (5ms at p99) at higher throughputs, while also providing strong durability and high availability*.
Kafka in its default configuration is faster than Pulsar in all latency benchmarks, and it is faster up to p99.9 when set to fsync on every message.
RabbitMQ can achieve lower end-to-end latency than Kafka, but only at significantly lower throughputs (30K messages/sec versus 200K messages/sec for Kafka), after which its latency degrades significantly.
*Full results described in the associated: benchmarking comparison
General | |||
---|---|---|---|
License | Apache v2 | Apache v2 | Mozilla Public |
Components | Kafka + Zookeeper(ZK is being removed) | Pulsar + Zookeeper + BookKeeper + RocksDB | RabbitMQ |
Message consumption model | Pull | Push | Push |
Storage architecture | Log | Index | Index |
Ease of Use | |||
---|---|---|---|
Operational simplicity | |||
Documentation & learning | |||
Open source ecosystem | |||
Size of user community | |||
Enterprise support | |||
Managed cloud offerings | |||
Management tooling built-in | |||
Integrations (databases, REST, COTS, etc.) | |||
Client library diversity |
Performance & availability | |||
---|---|---|---|
High-throughput workloads | |||
Low-latency workloads | |||
Elastic scaling | |||
High availability | |||
Global data replication | |||
Ordering guarantees | |||
Permanent storage |
Features | |||
---|---|---|---|
Built-in stream processing | |||
Message replay, time travel | |||
Exactly-once processing | |||
Topic (log) compaction | |||
Security |
Use Cases | |||
---|---|---|---|
Mission-critical | |||
Event Streaming | |||
Pub/sub | |||
Message routing | |||
Queueing |
In reality, Kafka, RabbitMQ, and Pulsar are three very different systems. Kafka is a pure distributed log designed for efficient event streaming at a high scale. RabbitMQ is a traditional messaging system, designed to publish messages quickly and delete them. Pulsar sits somewhere in between. It's not a distributed log in the true sense, but it synthesizes some similar properties.
Which to choose should be a fairly straightforward decision: for lightweight messaging that requires request-response, queuing, and pub-sub RabbitMQ is well suited; Pulsar is really only for the brave at heart, but it may have a place in the future for those that require both queuing and event streaming in the same system; for event streaming use cases that require high throughput, scalability, and permanent message storage Kafka is the clear winner. To learn more, check out this blog post on benchmarking Kafka, Pulsar, and RabbitMQ.