Apache Kafka for Beginners: Stream Processing in 2026
Learn Apache Kafka from scratch. Producers, consumers, topics, partitions, and stream processing explained with Docker setup and real code examples.
What Is Apache Kafka?
Apache Kafka is a distributed event streaming platform. Think of it as an extremely fast, durable, append-only log that multiple applications can write to and read from simultaneously. It handles millions of events per second and stores them reliably across a cluster.
A well-structured configuration file is the foundation of reproducible infrastructure.
If you have ever needed to move data between systems in real time — user events to analytics, orders to fulfillment, logs to monitoring — Kafka is the tool.
Core Concepts
Topics
A topic is a named stream of events. Think of it as a database table, but append-only. You cannot update or delete events; you can only add new ones.
Topic: "user-events"
Event 1: {user: "alice", action: "login", timestamp: "..."}
Event 2: {user: "bob", action: "purchase", timestamp: "..."}
Event 3: {user: "alice", action: "logout", timestamp: "..."}
Partitions
Topics are split into partitions for parallelism. Events with the same key always go to the same partition (guaranteeing order per key).
Topic: "orders" (3 partitions)
Partition 0: orders for customer IDs 0-999
Partition 1: orders for customer IDs 1000-1999
Partition 2: orders for customer IDs 2000+
Get more insights on Tutorials
Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.
Producers and Consumers
Producers write events. Consumers read events. Consumer groups allow parallel processing — each consumer in a group reads from different partitions.
Docker Setup
Get Kafka running in 60 seconds with KRaft mode (no Zookeeper needed since Kafka 3.3):
# docker-compose.yml
version: '3.8'
services:
kafka:
image: apache/kafka:3.7.0
container_name: kafka
ports:
- "9092:9092"
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_LOG_RETENTION_HOURS: 168 # 7 days
volumes:
- kafka_data:/var/lib/kafka/data
kafka-ui:
image: provectuslabs/kafka-ui:latest
container_name: kafka-ui
ports:
- "8080:8080"
environment:
KAFKA_CLUSTERS_0_NAME: local
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092
depends_on:
- kafka
volumes:
kafka_data:
docker compose up -d
# Kafka UI available at http://localhost:8080
Producing Events (Python)
from confluent_kafka import Producer
import json
import time
producer = Producer({
'bootstrap.servers': 'localhost:9092',
'client.id': 'order-service'
})
def delivery_callback(err, msg):
if err:
print(f"Delivery failed: {err}")
else:
print(f"Delivered to {msg.topic()} [{msg.partition()}] @ {msg.offset()}")
# Produce events
for i in range(100):
event = {
"order_id": f"ORD-{i:04d}",
"customer_id": f"CUST-{i % 10:03d}",
"amount": round(10 + i * 1.5, 2),
"timestamp": time.time()
}
producer.produce(
topic="orders",
key=event["customer_id"], # Same customer → same partition
value=json.dumps(event),
callback=delivery_callback
)
producer.flush() # Wait for all deliveries
print("All events produced")
Docker Compose brings up your entire stack with a single command.
Consuming Events (Python)
from confluent_kafka import Consumer
import json
consumer = Consumer({
'bootstrap.servers': 'localhost:9092',
'group.id': 'order-processor',
'auto.offset.reset': 'earliest',
'enable.auto.commit': False # Manual commit for reliability
})
consumer.subscribe(['orders'])
try:
while True:
msg = consumer.poll(timeout=1.0)
if msg is None:
continue
if msg.error():
print(f"Error: {msg.error()}")
continue
event = json.loads(msg.value())
print(f"Processing order {event['order_id']}: {event['amount']}")
# Process the event (save to DB, trigger fulfillment, etc.)
process_order(event)
# Commit offset after successful processing
consumer.commit(asynchronous=False)
except KeyboardInterrupt:
pass
finally:
consumer.close()
Producing Events (Node.js)
import { Kafka } from 'kafkajs';
const kafka = new Kafka({
clientId: 'notification-service',
brokers: ['localhost:9092']
});
const producer = kafka.producer();
async function sendNotification(userId, message) {
await producer.connect();
await producer.send({
topic: 'notifications',
messages: [
{
key: userId,
value: JSON.stringify({
userId,
message,
channel: 'email',
timestamp: new Date().toISOString()
})
}
]
});
}
// Consumer
const consumer = kafka.consumer({ groupId: 'email-sender' });
async function startEmailConsumer() {
await consumer.connect();
await consumer.subscribe({ topic: 'notifications', fromBeginning: true });
await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
const notification = JSON.parse(message.value!.toString());
console.log(`Sending email to ${notification.userId}: ${notification.message}`);
await sendEmail(notification);
}
});
}
Common Patterns
Event Sourcing
Store every state change as an event. Reconstruct current state by replaying events:
Topic: "account-events"
{type: "CREATED", accountId: "A1", balance: 0}
{type: "DEPOSITED", accountId: "A1", amount: 1000}
{type: "WITHDRAWN", accountId: "A1", amount: 200}
// Current balance: replay → 0 + 1000 - 200 = 800
CQRS (Command Query Responsibility Segregation)
Write to Kafka, consume into read-optimized stores:
Write path: API → Kafka "orders" topic
Read path: Kafka consumer → PostgreSQL → API query
Dead Letter Queue
Free Resource
Free Cloud Architecture Checklist
A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.
Failed messages go to a separate topic for retry or investigation:
def process_with_dlq(msg):
try:
process_order(json.loads(msg.value()))
except Exception as e:
# Send to dead letter queue
producer.produce(
topic="orders-dlq",
key=msg.key(),
value=msg.value(),
headers=[("error", str(e).encode())]
)
Performance Tuning
Key settings for production:
# Producer: batch messages for throughput
batch.size=65536
linger.ms=10
compression.type=lz4
# Consumer: fetch more data per poll
fetch.min.bytes=1024
fetch.max.wait.ms=500
max.poll.records=500
# Broker: retention and segment settings
log.retention.hours=168
log.segment.bytes=1073741824
num.partitions=12
Docker Compose defines your entire application stack in a single YAML file.
When to Use Kafka
Use Kafka when:
- You need real-time event processing across multiple services
- You need event replay capability
- You have high throughput requirements (>10K events/sec)
- You need durable, ordered event streams
Skip Kafka when:
- Simple request-response patterns (use HTTP)
- Low volume messaging (<100 events/sec, use Redis Pub/Sub)
- You need complex routing (use RabbitMQ)
- Your team is small and cannot operate a distributed system
At TechSaaS, we deploy Kafka for clients with genuine streaming needs and recommend simpler alternatives when the scale does not warrant it. The right tool for the right job.
Related Service
Cloud Solutions
Let our experts help you build the right technology strategy for your business.
Need help with tutorials?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.
We Will Build You a Demo Site — For Free
Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.
No spam. No contracts. Just a free demo.