New in Confluent Cloud: Making Data & Pipelines Accessible for AI-Ready Streaming | Learn More

Architectural Decision Guide: When to Use Apache Kafka (And When You Shouldn't)

Written By

Your team just shipped a microservices refactor. Services are smaller, deployments are faster, and boundaries are clearer. Then, during a design review, someone inevitably suggests: “We should use Kafka.”That suggestion might be the exact architectural breakthrough you need—or it could quietly introduce months of unnecessary operational complexity.This article serves as a practical decision framework. We will cut through the hype to help engineering teams understand what Apache Kafka is actually built for, where it creates genuine leverage, and when simpler alternatives are the smarter choice.

What Problem Does Kafka Actually Solve?

At its core, Apache Kafka solves one highly specific infrastructure problem: reliably moving ordered streams of events between systems, at scale, and over time.Kafka is best understood not as a traditional message queue, but as a distributed commit log. Producers append events to a log, while consumers read those events independently and at their own pace. The data is highly durable, strictly ordered, and retained long enough to allow for historical replay.This architectural model is critical when events are not transient messages, but permanent records of an action—such as an order being placed, a payment clearing, or a user updating their profile. These events often need to be consumed by multiple disparate systems, sometimes long after the original event was produced.

The Commit Log Model in 60 Seconds

To understand Kafka's power, you must understand its append-only log model:

Diagram showing producers appending events to a Kafka topic, while multiple consumers read the same log independently using their own offsets, enabling replay.
  • Kafka operates as a set of append-only logs, which are split into partitions to enable horizontal scaling.

  • Producers append new events directly to the end of a partition.

  • Consumers read the data sequentially and track their exact position using offsets.

  • Offsets are controlled entirely by the consumer, which is what allows for event rewinding and replayability.

  • Unlike traditional message queues where a message is deleted once consumed, Kafka retains data for a configured retention period, regardless of whether downstream systems have read it.

Kafka acts as the nervous system of a distributed architecture. It is fundamentally not a database, an API gateway, or a background job queue.

When to Use Apache Kafka

The easiest way to determine if Kafka belongs in your stack is to evaluate it against concrete operational scenarios. Kafka provides high engineering leverage in the following situations:

1. You Need Ordered, Durable Event Streams

  • Ideal Scenarios: Financial transactions, audit logging, and Change Data Capture (CDC).

  • The Leverage: Kafka guarantees strict ordering within a specific partition key and persists data to disk with configurable replication. This ensures events survive broker failures and are processed in the exact order they occurred.

2. Multiple Consumers Need the Same Data (Fan-Out)

  • Ideal Scenarios: An OrderCreated event needs to simultaneously feed a data warehouse, a fraud detection system, and a fulfillment service.

  • The Leverage: Kafka allows massive fan-out without duplicating data. Each consumer group receives the full stream and tracks its offsets independently.

3. You Need to Replay or Reprocess Events

  • Ideal Scenarios: A critical bug is discovered in a downstream consumer that has been running silently for weeks.

  • The Leverage: Because data retention is decoupled from consumption, you can easily rewind consumer offsets and reprocess historical data without involving the original producers or restoring database backups.

4. You Are Building Real-Time Data Pipelines

  • Ideal Scenarios: Streaming data from Postgres to Kafka, and then routing it to Elastic Search and a cloud data warehouse.

  • The Leverage: Leveraging Kafka Connect and its vast ecosystem (like Debezium for CDC or JDBC sink connectors), Kafka transforms from a simple pipe into a comprehensive streaming data platform.

5. Your Throughput is Non-Trivial

  • Ideal Scenarios: Sustained event volumes exceeding ~10,000 events per second.

  • The Leverage: Kafka is highly optimized for high-throughput, low-latency streaming utilizing sequential disk I/O, event batching, and zero-copy data transfer.

When NOT to Use Apache Kafka

Good architecture is just as much about what you choose not to adopt. In the following scenarios, Kafka is likely over-engineering.

  • You Just Need a Simple Task Queue: If your workload involves tasks like sending emails, resizing images, or processing background jobs (a "process-and-delete" model), Kafka's partitions and offsets add unnecessary complexity.

  • Your Scale is Small (and Will Stay Small): If you process hundreds of messages per minute, Kafka's operational overhead rarely justifies the cost.

  • You Need Synchronous "Request-Reply" Messaging: Kafka is strictly asynchronous. If your system relies on sending a request and waiting for an immediate response, protocols like HTTP or gRPC are the correct choice.

The 5-Question Litmus Test

Use this quick gut-check based on your current requirements, not hypothetical future scale:

  1. Do multiple independent consumers need access to the exact same event stream?

  2. Do you have a strict requirement to replay or reprocess historical events?

  3. Is your sustained, steady throughput greater than ~10,000 events per second?

  4. Do you require strict ordering within a partition key?

  5. Are you building a data pipeline (CDC, analytics, ML features) rather than point-to-point messaging?

How to score your architecture:

  • 0–1 "Yes": Utilize a simpler queue or database-backed solution.

  • 2–3 "Yes": Evaluate carefully; Kafka might be justified, but simpler alternatives remain viable.

  • 4–5 "Yes": Apache Kafka is very likely the correct architectural choice.

Side-by-Side Comparison: Kafka vs. Alternatives

Choosing Kafka is about finding the best trade-off for your specific workload.

Kafka vs. RabbitMQ

RabbitMQ is optimized for traditional work queues and RPC-style synchronous messaging. Kafka wins when events must be shared, replayed, and retained long-term.

Feature

Apache Kafka

RabbitMQ

Ordering

Guaranteed per partition key.

Per-queue ordering (can degrade with scaling).

Throughput

Designed for sustained >10K events/sec.

Moderate; optimized for low latency.

Consumer Model

Fan-out via independent consumer groups.

Competing consumers (one message to one consumer).

Replay

Native support via offset rewinding.

Not natively supported.

Kafka vs. AWS SQS / SNS

Amazon SQS is the lowest-ops solution for simple cloud queuing. Kafka becomes compelling when you need replay capabilities, long-term retention, or sustained high throughput.

Feature

Apache Kafka

AWS SQS / SNS

Ordering

Strong per-partition ordering.

Best-effort (FIFO has hard limits).

Message Retention

Configurable from days to months.

Limited to a maximum of 14 days.

Replay

Native.

Requires manual re-publishing.

Pricing Model

Infrastructure-based or managed capacity.

Per-request pricing.

Can I Just Use Postgres?

For small-scale pub/sub, PostgreSQL features like LISTEN/NOTIFY or the outbox pattern work exceptionally well. However, Postgres is best limited to fewer than ~5 consumers and under ~1,000 messages per second. You should graduate to Kafka when scale or fan-out requirements break your database limits.

Self-Managed vs. Managed Kafka

If Kafka passes your litmus test, your next decision is operational deployment. Both self-managed and managed Kafka are legitimate choices, heavily dependent on your team's capacity and constraints.

When to Choose Self-Managed Kafka

Self-managed Kafka is necessary if:

  • You have a dedicated platform team with real Kafka operational experience.

  • You require deep JVM-level tuning, custom interceptors, or non-standard authentication.

  • Regulatory, data sovereignty, or compliance rules require air-gapped or on-premises deployments.

The Hidden Costs of Self-Managing: Be aware of the Total Cost of Operations (TCO). Production Kafka usually requires 0.5 to 2 full-time engineers for patching, upgrades, and incident response. Major architecture shifts—like migrating from ZooKeeper to KRaft—require intense planning and rollback strategies.

When to Choose Managed Kafka (e.g., Confluent Cloud)

Managed Kafka is a force multiplier if:

  • You want your engineering teams focused on building product features rather than running distributed infrastructure.

  • You lack a dedicated Kafka operations team.

  • You require ecosystem features like Schema Registry, Kafka Connect, or stream processing natively managed.

  • Built-in RBAC, audit logging, and predictable costs are business priorities.

Decision Tree — Should You Use Kafka?

This decision tree translates everything covered so far into a practical architecture choice. Start at the top, answer honestly based on current requirements, and follow the path to a recommendation. The goal is not to push Kafka—it’s to help you land on the lowest-complexity tool that still meets your needs.

Key Takeaways

  • Apache Kafka solves a specific class of problems: ordered, durable, replayable event streams at scale. It should not be your default messaging tool.

  • If you just need a task queue, use simpler systems like managed queues or traditional message brokers. They are cheaper, easier to operate, and better aligned with one-time work.

  • If your scale is small, starting with Postgres-backed queuing or pub/sub is often the most pragmatic choice. You can graduate to Kafka when you hit clear limits.

  • Kafka earns its complexity when multiple consumers need the same data, events must be replayed, or throughput is sustained and high.

  • Self-managed Kafka is a legitimate choice if you have the team, expertise, and constraints to support it—but be honest about the total cost of operations.

  • Managed Kafka makes sense when your priority is shipping features, not running infrastructure, and your requirements don’t demand deep internal customisation.

  • Mohtasham is an Associate Solutions Architect at Confluent, where he focuses on enabling organizations to build scalable, real-time data platforms using technologies like Apache Kafka, Apache Flink, and Kubernetes. With deep expertise in AI, cloud infrastructure, and event-driven architecture, he helps customers unlock the full potential of data streaming. Mohtasham is multi-cloud certified and actively engaged in the cloud community, where he shares his insights and supports knowledge sharing across cloud-native and data engineering spaces.

Did you like this blog post? Share it now