Apache Kafka for Beginners: Stream Processing in 2026
Learn Apache Kafka from scratch. Producers, consumers, topics, partitions, and stream processing explained with Docker setup and real code examples.
What Is Apache Kafka?
Apache Kafka is a distributed event streaming platform. Think of it as an extremely fast, durable, append-only log that multiple applications can write to and read from simultaneously. It handles millions of events per second and stores them reliably across a cluster.
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><rect x="0" y="0" width="600" height="28" rx="12" fill="#2d2d44"/><rect x="0" y="12" width="600" height="16" fill="#2d2d44"/><circle cx="18" cy="14" r="5" fill="#ef4444"/><circle cx="34" cy="14" r="5" fill="#f59e0b"/><circle cx="50" cy="14" r="5" fill="#2dd4bf"/><text x="300" y="18" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">docker-compose.yml</text><rect x="0" y="28" width="35" height="172" fill="#1e1e32"/><text x="25" y="48" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">1</text><text x="25" y="66" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">2</text><text x="25" y="84" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">3</text><text x="25" y="102" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">4</text><text x="25" y="120" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">5</text><text x="25" y="138" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">6</text><text x="25" y="156" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">7</text><text x="25" y="174" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">8</text><text x="25" y="192" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">9</text><text x="45" y="48" fill="#a855f7" font-size="11" font-family="monospace">version</text><text x="100" y="48" fill="#e2e8f0" font-size="11" font-family="monospace">: "3.8"</text><text x="45" y="66" fill="#a855f7" font-size="11" font-family="monospace">services</text><text x="105" y="66" fill="#e2e8f0" font-size="11" font-family="monospace">:</text><text x="55" y="84" fill="#3b82f6" font-size="11" font-family="monospace"> web</text><text x="80" y="84" fill="#e2e8f0" font-size="11" font-family="monospace">:</text><text x="55" y="102" fill="#2dd4bf" font-size="11" font-family="monospace"> image</text><text x="110" y="102" fill="#e2e8f0" font-size="11" font-family="monospace">: nginx:alpine</text><text x="55" y="120" fill="#2dd4bf" font-size="11" font-family="monospace"> ports</text><text x="102" y="120" fill="#e2e8f0" font-size="11" font-family="monospace">:</text><text x="55" y="138" fill="#e2e8f0" font-size="11" font-family="monospace"> - "80:80"</text><text x="55" y="156" fill="#2dd4bf" font-size="11" font-family="monospace"> volumes</text><text x="118" y="156" fill="#e2e8f0" font-size="11" font-family="monospace">:</text><text x="55" y="174" fill="#e2e8f0" font-size="11" font-family="monospace"> - ./html:/usr/share/nginx</text><rect x="365" y="164" width="2" height="14" fill="#6366f1" opacity="0.8"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">A well-structured configuration file is the foundation of reproducible infrastructure.</p></div>
If you have ever needed to move data between systems in real time — user events to analytics, orders to fulfillment, logs to monitoring — Kafka is the tool.
Core Concepts
Topics
A topic is a named stream of events. Think of it as a database table, but append-only. You cannot update or delete events; you can only add new ones.
Topic: "user-events"
Event 1: {user: "alice", action: "login", timestamp: "..."}
Event 2: {user: "bob", action: "purchase", timestamp: "..."}
Event 3: {user: "alice", action: "logout", timestamp: "..."}Partitions
Topics are split into partitions for parallelism. Events with the same key always go to the same partition (guaranteeing order per key).
Topic: "orders" (3 partitions)
Partition 0: orders for customer IDs 0-999
Partition 1: orders for customer IDs 1000-1999
Partition 2: orders for customer IDs 2000+Producers and Consumers
Producers write events. Consumers read events. Consumer groups allow parallel processing — each consumer in a group reads from different partitions.
Docker Setup
Get Kafka running in 60 seconds with KRaft mode (no Zookeeper needed since Kafka 3.3):
# docker-compose.yml
version: '3.8'
services:
kafka:
image: apache/kafka:3.7.0
container_name: kafka
ports:
- "9092:9092"
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_LOG_RETENTION_HOURS: 168 # 7 days
volumes:
- kafka_data:/var/lib/kafka/data
kafka-ui:
image: provectuslabs/kafka-ui:latest
container_name: kafka-ui
ports:
- "8080:8080"
environment:
KAFKA_CLUSTERS_0_NAME: local
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092
depends_on:
- kafka
volumes:
kafka_data:docker compose up -d
# Kafka UI available at http://localhost:8080Producing Events (Python)
from confluent_kafka import Producer
import json
import time
producer = Producer({
'bootstrap.servers': 'localhost:9092',
'client.id': 'order-service'
})
def delivery_callback(err, msg):
if err:
print(f"Delivery failed: {err}")
else:
print(f"Delivered to {msg.topic()} [{msg.partition()}] @ {msg.offset()}")
# Produce events
for i in range(100):
event = {
"order_id": f"ORD-{i:04d}",
"customer_id": f"CUST-{i % 10:03d}",
"amount": round(10 + i * 1.5, 2),
"timestamp": time.time()
}
producer.produce(
topic="orders",
key=event["customer_id"], # Same customer → same partition
value=json.dumps(event),
callback=delivery_callback
)
producer.flush() # Wait for all deliveries
print("All events produced")<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 190" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="190" rx="12" fill="#0d1117"/><rect x="0" y="0" width="600" height="28" rx="12" fill="#1c2333"/><rect x="0" y="12" width="600" height="16" fill="#1c2333"/><circle cx="18" cy="14" r="5" fill="#ef4444"/><circle cx="34" cy="14" r="5" fill="#f59e0b"/><circle cx="50" cy="14" r="5" fill="#2dd4bf"/><text x="300" y="18" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="monospace">Terminal</text><text x="20" y="50" fill="#2dd4bf" font-size="11" font-family="monospace">$</text><text x="35" y="50" fill="#e2e8f0" font-size="11" font-family="monospace">docker compose up -d</text><text x="20" y="70" fill="#94a3b8" font-size="11" font-family="monospace">[+] Running 5/5</text><text x="20" y="88" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="88" fill="#94a3b8" font-size="10" font-family="monospace">Network app_default Created</text><text x="20" y="106" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="106" fill="#94a3b8" font-size="10" font-family="monospace">Container web Started</text><text x="20" y="124" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="124" fill="#94a3b8" font-size="10" font-family="monospace">Container api Started</text><text x="20" y="142" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="142" fill="#94a3b8" font-size="10" font-family="monospace">Container db Started</text><text x="20" y="165" fill="#2dd4bf" font-size="11" font-family="monospace">$</text><rect x="35" y="155" width="8" height="14" fill="#e2e8f0" opacity="0.7"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Docker Compose brings up your entire stack with a single command.</p></div>
Consuming Events (Python)
from confluent_kafka import Consumer
import json
consumer = Consumer({
'bootstrap.servers': 'localhost:9092',
'group.id': 'order-processor',
'auto.offset.reset': 'earliest',
'enable.auto.commit': False # Manual commit for reliability
})
consumer.subscribe(['orders'])
try:
while True:
msg = consumer.poll(timeout=1.0)
if msg is None:
continue
if msg.error():
print(f"Error: {msg.error()}")
continue
event = json.loads(msg.value())
print(f"Processing order {event['order_id']}: {event['amount']}")
# Process the event (save to DB, trigger fulfillment, etc.)
process_order(event)
# Commit offset after successful processing
consumer.commit(asynchronous=False)
except KeyboardInterrupt:
pass
finally:
consumer.close()Producing Events (Node.js)
import { Kafka } from 'kafkajs';
const kafka = new Kafka({
clientId: 'notification-service',
brokers: ['localhost:9092']
});
const producer = kafka.producer();
async function sendNotification(userId, message) {
await producer.connect();
await producer.send({
topic: 'notifications',
messages: [
{
key: userId,
value: JSON.stringify({
userId,
message,
channel: 'email',
timestamp: new Date().toISOString()
})
}
]
});
}
// Consumer
const consumer = kafka.consumer({ groupId: 'email-sender' });
async function startEmailConsumer() {
await consumer.connect();
await consumer.subscribe({ topic: 'notifications', fromBeginning: true });
await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
const notification = JSON.parse(message.value!.toString());
console.log(`Sending email to ${notification.userId}: ${notification.message}`);
await sendEmail(notification);
}
});
}Common Patterns
Event Sourcing
Store every state change as an event. Reconstruct current state by replaying events:
Topic: "account-events"
{type: "CREATED", accountId: "A1", balance: 0}
{type: "DEPOSITED", accountId: "A1", amount: 1000}
{type: "WITHDRAWN", accountId: "A1", amount: 200}
// Current balance: replay → 0 + 1000 - 200 = 800CQRS (Command Query Responsibility Segregation)
Write to Kafka, consume into read-optimized stores:
Write path: API → Kafka "orders" topic
Read path: Kafka consumer → PostgreSQL → API queryDead Letter Queue
Failed messages go to a separate topic for retry or investigation:
def process_with_dlq(msg):
try:
process_order(json.loads(msg.value()))
except Exception as e:
# Send to dead letter queue
producer.produce(
topic="orders-dlq",
key=msg.key(),
value=msg.value(),
headers=[("error", str(e).encode())]
)Performance Tuning
Key settings for production:
# Producer: batch messages for throughput
batch.size=65536
linger.ms=10
compression.type=lz4
# Consumer: fetch more data per poll
fetch.min.bytes=1024
fetch.max.wait.ms=500
max.poll.records=500
# Broker: retention and segment settings
log.retention.hours=168
log.segment.bytes=1073741824
num.partitions=12<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><rect x="30" y="30" width="100" height="130" rx="6" fill="none" stroke="#3b82f6" stroke-width="1.5"/><text x="80" y="55" text-anchor="middle" fill="#3b82f6" font-size="10" font-family="monospace">docker-</text><text x="80" y="70" text-anchor="middle" fill="#3b82f6" font-size="10" font-family="monospace">compose</text><text x="80" y="85" text-anchor="middle" fill="#3b82f6" font-size="10" font-family="monospace">.yml</text><line x1="45" y1="95" x2="115" y2="95" stroke="#3b82f6" stroke-width="0.5" opacity="0.5"/><rect x="50" y="105" width="50" height="8" rx="2" fill="#94a3b8" opacity="0.3"/><rect x="50" y="118" width="60" height="8" rx="2" fill="#94a3b8" opacity="0.3"/><rect x="50" y="131" width="40" height="8" rx="2" fill="#94a3b8" opacity="0.3"/><path d="M135,95 L175,95" stroke="#e2e8f0" stroke-width="2" marker-end="url(#arrow2)"/><defs><marker id="arrow2" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#e2e8f0"/></marker></defs><rect x="180" y="20" width="130" height="35" rx="6" fill="#6366f1" opacity="0.85"/><text x="245" y="42" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">Web App</text><rect x="180" y="62" width="130" height="35" rx="6" fill="#a855f7" opacity="0.85"/><text x="245" y="84" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">API Server</text><rect x="180" y="104" width="130" height="35" rx="6" fill="#2dd4bf" opacity="0.85"/><text x="245" y="126" text-anchor="middle" fill="#1a1a2e" font-size="11" font-family="system-ui">Database</text><rect x="180" y="146" width="130" height="35" rx="6" fill="#f59e0b" opacity="0.85"/><text x="245" y="168" text-anchor="middle" fill="#1a1a2e" font-size="11" font-family="system-ui">Cache</text><rect x="370" y="40" width="200" height="130" rx="8" fill="none" stroke="#e2e8f0" stroke-width="1" stroke-dasharray="5,4"/><text x="470" y="62" text-anchor="middle" fill="#e2e8f0" font-size="10" font-family="system-ui">Docker Network</text><line x1="310" y1="37" x2="390" y2="80" stroke="#94a3b8" stroke-width="1" opacity="0.5"/><line x1="310" y1="79" x2="390" y2="100" stroke="#94a3b8" stroke-width="1" opacity="0.5"/><line x1="310" y1="121" x2="390" y2="120" stroke="#94a3b8" stroke-width="1" opacity="0.5"/><line x1="310" y1="163" x2="390" y2="140" stroke="#94a3b8" stroke-width="1" opacity="0.5"/><circle cx="400" cy="80" r="5" fill="#6366f1"/><circle cx="400" cy="100" r="5" fill="#a855f7"/><circle cx="400" cy="120" r="5" fill="#2dd4bf"/><circle cx="400" cy="140" r="5" fill="#f59e0b"/><text x="470" y="85" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">:3000</text><text x="470" y="105" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">:8080</text><text x="470" y="125" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">:5432</text><text x="470" y="145" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">:6379</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Docker Compose defines your entire application stack in a single YAML file.</p></div>
When to Use Kafka
Use Kafka when:
Skip Kafka when:
At TechSaaS, we deploy Kafka for clients with genuine streaming needs and recommend simpler alternatives when the scale does not warrant it. The right tool for the right job.
Need help with tutorials?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.