Message Queue Comparison

Comparing Kafka, RabbitMQ, and Redis Streams for different use cases.

August 10, 2025 Backend , Engineering 2 min read

Message queues are the nervous system of a distributed application. But choosing between Kafka, RabbitMQ, and SQS isn’t about which is “best” — it’s about which trade-offs match your workload.

After running all three in production, here’s a practical decision framework based on real operational experience.

We benchmarked each system on identical hardware (c6i.4xlarge, 16 vCPU, 32GB RAM):

Metric	Kafka	RabbitMQ	SQS
Max publish rate	800K msg/s	50K msg/s	3K msg/s
Max consume rate	1M msg/s	40K msg/s	3K msg/s
Avg latency (p99)	10ms	5ms	250ms
Message size limit	1MB	128MB	256KB
Retention	Unlimited	Until ack’d	14 days max

Kafka dominates on throughput. RabbitMQ wins on latency. SQS is the slowest but requires zero infrastructure management.

Ordering is where the differences become stark:

# Kafka: ordering per partition key
producer.send(
    topic='orders',
    key=b'customer_123',  # Same key = same partition = ordered
    value=b'{"event": "order.created", "customer": "123"}'
)
producer.send(
    topic='orders',
    key=b'customer_123',
    value=b'{"event": "payment.received", "customer": "123"}'
)
# Guaranteed: order.created is consumed before payment.received

# RabbitMQ: no ordering guarantee across consumers
channel.basic_publish(
    exchange='orders',
    routing_key='order_events',
    body=b'{"event": "order.created"}'
)
# Multiple consumers may receive messages out of order

# SQS: standard queue has no ordering; FIFO queue has ordering but limited throughput
sqs.send_message(
    QueueUrl=FIFO_QUEUE_URL,
    MessageBody='{"event": "order.created"}',
    MessageGroupId='customer_123',  # FIFO only
)

Ordering summary:

System	Ordering	Guarantee
Kafka	Per partition key	Strict within partition
RabbitMQ	Per queue	Strict if single consumer
SQS Standard	None	Best effort only
SQS FIFO	Per message group	Strict within group

This is often the deciding factor:

# Minimum production Kafka setup:
# - 3 ZooKeeper nodes (or KRaft mode)
# - 3+ Kafka brokers
# - Topic management
# - Partition rebalancing
# - Consumer group coordination

Kafka has the highest operational overhead. You’re managing a distributed system with many moving parts. Confluent Cloud reduces this but at a premium cost.

# RabbitMQ setup:
# - 3-node cluster (mirrored queues)
# - Queue/exchange declarations
# - Vhost management

RabbitMQ is simpler than Kafka but still requires cluster management. The management UI is excellent for debugging.

# SQS setup:
import boto3
sqs = boto3.client('sqs')
queue = sqs.create_queue(QueueName='my-queue')
# That's it. No infrastructure to manage.

SQS has zero operational overhead. You don’t manage servers, clusters, or partitions. The trade-off is less control and higher per-message cost at scale.

You need high throughput (100K+ messages/second)
Event replay is valuable (new service onboarding, debugging)
You need streaming analytics (Kafka Streams, ksqlDB)
You can handle operational complexity

# Kafka excels at event streaming
from kafka import KafkaConsumer

consumer = KafkaConsumer(
    'orders.events',
    bootstrap_servers=['kafka:9092'],
    auto_offset_reset='earliest',  # Can replay from beginning
    group_id='analytics-service',
)

for message in consumer:
    process_event(message.value)

You need complex routing patterns (topic exchanges, headers)
Low latency is critical (< 10ms)
You need message TTL, dead letter queues, or priority queues
Moderate throughput (under 50K messages/second)

# RabbitMQ excels at complex routing
channel.exchange_declare(exchange='orders', exchange_type='topic')
channel.queue_bind(exchange='orders', queue='email_notifications',
                   routing_key='order.*.created')
channel.queue_bind(exchange='orders', queue='inventory_updates',
                   routing_key='order.*.shipped')

You want zero infrastructure management
Throughput needs are moderate (under 3K messages/second)
You’re already on AWS
You don’t need event replay

# SQS excels at simple task queues
sqs.send_message(
    QueueUrl=QUEUE_URL,
    MessageBody=json.dumps({'task': 'send_email', 'to': 'user@example.com'}),
    DelaySeconds=60,  # Built-in delay queue
)

We use all three, each for its strengths:

Event Streaming (Kafka):
  - Order events (replay needed for new services)
  - User activity tracking (high throughput)
  - Audit log (unlimited retention)

Task Distribution (RabbitMQ):
  - Email/SMS notifications (complex routing)
  - Report generation (priority queues)
  - Webhook delivery (dead letter queues)

Simple Queues (SQS):
  - Lambda triggers (native AWS integration)
  - Background jobs (low throughput, zero ops)
  - Batch processing (visibility timeouts)

Match the tool to the workload — no single queue handles everything well
Throughput needs change — design for 10x your current load
Ordering is expensive — only require it when you actually need it
Dead letter queues are mandatory — failed messages need somewhere to go
Monitor queue depth — growing queues are your earliest failure indicator

Questions about message queues? Find me on GitHub or Twitter.

Distributed Rate Limiter in Go

Building a production-grade rate limiter with Redis and Golang.

go redis rate-limiting

Monolith to Event-Driven Architecture

Step-by-step guide to migrating from a monolith to an event-driven architecture.

architecture event-driven migration

Caching Strategies for Distributed Systems

Implementing effective caching strategies in distributed systems.

architecture caching redis

Related Posts

Distributed Rate Limiter in Go

Monolith to Event-Driven Architecture

Caching Strategies for Distributed Systems