Apache Kafka for Beginners: Stream Processing in 2026

Learn Apache Kafka from scratch. Producers, consumers, topics, partitions, and stream processing explained with Docker setup and real code examples.

Y
Yash Pritwani
14 min read

What Is Apache Kafka?

Apache Kafka is a distributed event streaming platform. Think of it as an extremely fast, durable, append-only log that multiple applications can write to and read from simultaneously. It handles millions of events per second and stores them reliably across a cluster.

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><rect x="0" y="0" width="600" height="28" rx="12" fill="#2d2d44"/><rect x="0" y="12" width="600" height="16" fill="#2d2d44"/><circle cx="18" cy="14" r="5" fill="#ef4444"/><circle cx="34" cy="14" r="5" fill="#f59e0b"/><circle cx="50" cy="14" r="5" fill="#2dd4bf"/><text x="300" y="18" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">docker-compose.yml</text><rect x="0" y="28" width="35" height="172" fill="#1e1e32"/><text x="25" y="48" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">1</text><text x="25" y="66" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">2</text><text x="25" y="84" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">3</text><text x="25" y="102" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">4</text><text x="25" y="120" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">5</text><text x="25" y="138" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">6</text><text x="25" y="156" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">7</text><text x="25" y="174" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">8</text><text x="25" y="192" text-anchor="end" fill="#94a3b8" font-size="10" font-family="monospace" opacity="0.5">9</text><text x="45" y="48" fill="#a855f7" font-size="11" font-family="monospace">version</text><text x="100" y="48" fill="#e2e8f0" font-size="11" font-family="monospace">: &quot;3.8&quot;</text><text x="45" y="66" fill="#a855f7" font-size="11" font-family="monospace">services</text><text x="105" y="66" fill="#e2e8f0" font-size="11" font-family="monospace">:</text><text x="55" y="84" fill="#3b82f6" font-size="11" font-family="monospace"> web</text><text x="80" y="84" fill="#e2e8f0" font-size="11" font-family="monospace">:</text><text x="55" y="102" fill="#2dd4bf" font-size="11" font-family="monospace"> image</text><text x="110" y="102" fill="#e2e8f0" font-size="11" font-family="monospace">: nginx:alpine</text><text x="55" y="120" fill="#2dd4bf" font-size="11" font-family="monospace"> ports</text><text x="102" y="120" fill="#e2e8f0" font-size="11" font-family="monospace">:</text><text x="55" y="138" fill="#e2e8f0" font-size="11" font-family="monospace"> - &quot;80:80&quot;</text><text x="55" y="156" fill="#2dd4bf" font-size="11" font-family="monospace"> volumes</text><text x="118" y="156" fill="#e2e8f0" font-size="11" font-family="monospace">:</text><text x="55" y="174" fill="#e2e8f0" font-size="11" font-family="monospace"> - ./html:/usr/share/nginx</text><rect x="365" y="164" width="2" height="14" fill="#6366f1" opacity="0.8"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">A well-structured configuration file is the foundation of reproducible infrastructure.</p></div>

If you have ever needed to move data between systems in real time — user events to analytics, orders to fulfillment, logs to monitoring — Kafka is the tool.

Core Concepts

Topics

A topic is a named stream of events. Think of it as a database table, but append-only. You cannot update or delete events; you can only add new ones.

Topic: "user-events"
  Event 1: {user: "alice", action: "login", timestamp: "..."}
  Event 2: {user: "bob", action: "purchase", timestamp: "..."}
  Event 3: {user: "alice", action: "logout", timestamp: "..."}

Partitions

Topics are split into partitions for parallelism. Events with the same key always go to the same partition (guaranteeing order per key).

Topic: "orders" (3 partitions)
  Partition 0: orders for customer IDs 0-999
  Partition 1: orders for customer IDs 1000-1999
  Partition 2: orders for customer IDs 2000+

Producers and Consumers

Producers write events. Consumers read events. Consumer groups allow parallel processing — each consumer in a group reads from different partitions.

Docker Setup

Get Kafka running in 60 seconds with KRaft mode (no Zookeeper needed since Kafka 3.3):

# docker-compose.yml
version: '3.8'
services:
  kafka:
    image: apache/kafka:3.7.0
    container_name: kafka
    ports:
      - "9092:9092"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_LOG_RETENTION_HOURS: 168  # 7 days
    volumes:
      - kafka_data:/var/lib/kafka/data

  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    container_name: kafka-ui
    ports:
      - "8080:8080"
    environment:
      KAFKA_CLUSTERS_0_NAME: local
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092
    depends_on:
      - kafka

volumes:
  kafka_data:
docker compose up -d
# Kafka UI available at http://localhost:8080

Producing Events (Python)

from confluent_kafka import Producer
import json
import time

producer = Producer({
    'bootstrap.servers': 'localhost:9092',
    'client.id': 'order-service'
})

def delivery_callback(err, msg):
    if err:
        print(f"Delivery failed: {err}")
    else:
        print(f"Delivered to {msg.topic()} [{msg.partition()}] @ {msg.offset()}")

# Produce events
for i in range(100):
    event = {
        "order_id": f"ORD-{i:04d}",
        "customer_id": f"CUST-{i % 10:03d}",
        "amount": round(10 + i * 1.5, 2),
        "timestamp": time.time()
    }

    producer.produce(
        topic="orders",
        key=event["customer_id"],  # Same customer → same partition
        value=json.dumps(event),
        callback=delivery_callback
    )

producer.flush()  # Wait for all deliveries
print("All events produced")

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 190" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="190" rx="12" fill="#0d1117"/><rect x="0" y="0" width="600" height="28" rx="12" fill="#1c2333"/><rect x="0" y="12" width="600" height="16" fill="#1c2333"/><circle cx="18" cy="14" r="5" fill="#ef4444"/><circle cx="34" cy="14" r="5" fill="#f59e0b"/><circle cx="50" cy="14" r="5" fill="#2dd4bf"/><text x="300" y="18" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="monospace">Terminal</text><text x="20" y="50" fill="#2dd4bf" font-size="11" font-family="monospace">$</text><text x="35" y="50" fill="#e2e8f0" font-size="11" font-family="monospace">docker compose up -d</text><text x="20" y="70" fill="#94a3b8" font-size="11" font-family="monospace">[+] Running 5/5</text><text x="20" y="88" fill="#2dd4bf" font-size="10" font-family="monospace"> &#x2713;</text><text x="38" y="88" fill="#94a3b8" font-size="10" font-family="monospace">Network app_default Created</text><text x="20" y="106" fill="#2dd4bf" font-size="10" font-family="monospace"> &#x2713;</text><text x="38" y="106" fill="#94a3b8" font-size="10" font-family="monospace">Container web Started</text><text x="20" y="124" fill="#2dd4bf" font-size="10" font-family="monospace"> &#x2713;</text><text x="38" y="124" fill="#94a3b8" font-size="10" font-family="monospace">Container api Started</text><text x="20" y="142" fill="#2dd4bf" font-size="10" font-family="monospace"> &#x2713;</text><text x="38" y="142" fill="#94a3b8" font-size="10" font-family="monospace">Container db Started</text><text x="20" y="165" fill="#2dd4bf" font-size="11" font-family="monospace">$</text><rect x="35" y="155" width="8" height="14" fill="#e2e8f0" opacity="0.7"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Docker Compose brings up your entire stack with a single command.</p></div>

Consuming Events (Python)

from confluent_kafka import Consumer
import json

consumer = Consumer({
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'order-processor',
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': False  # Manual commit for reliability
})

consumer.subscribe(['orders'])

try:
    while True:
        msg = consumer.poll(timeout=1.0)
        if msg is None:
            continue
        if msg.error():
            print(f"Error: {msg.error()}")
            continue

        event = json.loads(msg.value())
        print(f"Processing order {event['order_id']}: {event['amount']}")

        # Process the event (save to DB, trigger fulfillment, etc.)
        process_order(event)

        # Commit offset after successful processing
        consumer.commit(asynchronous=False)

except KeyboardInterrupt:
    pass
finally:
    consumer.close()

Producing Events (Node.js)

import { Kafka } from 'kafkajs';

const kafka = new Kafka({
  clientId: 'notification-service',
  brokers: ['localhost:9092']
});

const producer = kafka.producer();

async function sendNotification(userId, message) {
  await producer.connect();
  await producer.send({
    topic: 'notifications',
    messages: [
      {
        key: userId,
        value: JSON.stringify({
          userId,
          message,
          channel: 'email',
          timestamp: new Date().toISOString()
        })
      }
    ]
  });
}

// Consumer
const consumer = kafka.consumer({ groupId: 'email-sender' });

async function startEmailConsumer() {
  await consumer.connect();
  await consumer.subscribe({ topic: 'notifications', fromBeginning: true });

  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      const notification = JSON.parse(message.value!.toString());
      console.log(`Sending email to ${notification.userId}: ${notification.message}`);
      await sendEmail(notification);
    }
  });
}

Common Patterns

Event Sourcing

Store every state change as an event. Reconstruct current state by replaying events:

Topic: "account-events"
  {type: "CREATED", accountId: "A1", balance: 0}
  {type: "DEPOSITED", accountId: "A1", amount: 1000}
  {type: "WITHDRAWN", accountId: "A1", amount: 200}
  // Current balance: replay → 0 + 1000 - 200 = 800

CQRS (Command Query Responsibility Segregation)

Write to Kafka, consume into read-optimized stores:

Write path: API → Kafka "orders" topic
Read path: Kafka consumer → PostgreSQL → API query

Dead Letter Queue

Failed messages go to a separate topic for retry or investigation:

def process_with_dlq(msg):
    try:
        process_order(json.loads(msg.value()))
    except Exception as e:
        # Send to dead letter queue
        producer.produce(
            topic="orders-dlq",
            key=msg.key(),
            value=msg.value(),
            headers=[("error", str(e).encode())]
        )

Performance Tuning

Key settings for production:

# Producer: batch messages for throughput
batch.size=65536
linger.ms=10
compression.type=lz4

# Consumer: fetch more data per poll
fetch.min.bytes=1024
fetch.max.wait.ms=500
max.poll.records=500

# Broker: retention and segment settings
log.retention.hours=168
log.segment.bytes=1073741824
num.partitions=12

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><rect x="30" y="30" width="100" height="130" rx="6" fill="none" stroke="#3b82f6" stroke-width="1.5"/><text x="80" y="55" text-anchor="middle" fill="#3b82f6" font-size="10" font-family="monospace">docker-</text><text x="80" y="70" text-anchor="middle" fill="#3b82f6" font-size="10" font-family="monospace">compose</text><text x="80" y="85" text-anchor="middle" fill="#3b82f6" font-size="10" font-family="monospace">.yml</text><line x1="45" y1="95" x2="115" y2="95" stroke="#3b82f6" stroke-width="0.5" opacity="0.5"/><rect x="50" y="105" width="50" height="8" rx="2" fill="#94a3b8" opacity="0.3"/><rect x="50" y="118" width="60" height="8" rx="2" fill="#94a3b8" opacity="0.3"/><rect x="50" y="131" width="40" height="8" rx="2" fill="#94a3b8" opacity="0.3"/><path d="M135,95 L175,95" stroke="#e2e8f0" stroke-width="2" marker-end="url(#arrow2)"/><defs><marker id="arrow2" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#e2e8f0"/></marker></defs><rect x="180" y="20" width="130" height="35" rx="6" fill="#6366f1" opacity="0.85"/><text x="245" y="42" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">Web App</text><rect x="180" y="62" width="130" height="35" rx="6" fill="#a855f7" opacity="0.85"/><text x="245" y="84" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">API Server</text><rect x="180" y="104" width="130" height="35" rx="6" fill="#2dd4bf" opacity="0.85"/><text x="245" y="126" text-anchor="middle" fill="#1a1a2e" font-size="11" font-family="system-ui">Database</text><rect x="180" y="146" width="130" height="35" rx="6" fill="#f59e0b" opacity="0.85"/><text x="245" y="168" text-anchor="middle" fill="#1a1a2e" font-size="11" font-family="system-ui">Cache</text><rect x="370" y="40" width="200" height="130" rx="8" fill="none" stroke="#e2e8f0" stroke-width="1" stroke-dasharray="5,4"/><text x="470" y="62" text-anchor="middle" fill="#e2e8f0" font-size="10" font-family="system-ui">Docker Network</text><line x1="310" y1="37" x2="390" y2="80" stroke="#94a3b8" stroke-width="1" opacity="0.5"/><line x1="310" y1="79" x2="390" y2="100" stroke="#94a3b8" stroke-width="1" opacity="0.5"/><line x1="310" y1="121" x2="390" y2="120" stroke="#94a3b8" stroke-width="1" opacity="0.5"/><line x1="310" y1="163" x2="390" y2="140" stroke="#94a3b8" stroke-width="1" opacity="0.5"/><circle cx="400" cy="80" r="5" fill="#6366f1"/><circle cx="400" cy="100" r="5" fill="#a855f7"/><circle cx="400" cy="120" r="5" fill="#2dd4bf"/><circle cx="400" cy="140" r="5" fill="#f59e0b"/><text x="470" y="85" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">:3000</text><text x="470" y="105" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">:8080</text><text x="470" y="125" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">:5432</text><text x="470" y="145" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">:6379</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Docker Compose defines your entire application stack in a single YAML file.</p></div>

When to Use Kafka

Use Kafka when:

You need real-time event processing across multiple services
You need event replay capability
You have high throughput requirements (>10K events/sec)
You need durable, ordered event streams

Skip Kafka when:

Simple request-response patterns (use HTTP)
Low volume messaging (<100 events/sec, use Redis Pub/Sub)
You need complex routing (use RabbitMQ)
Your team is small and cannot operate a distributed system

At TechSaaS, we deploy Kafka for clients with genuine streaming needs and recommend simpler alternatives when the scale does not warrant it. The right tool for the right job.

#kafka#stream-processing#event-driven#docker#messaging#tutorial

Need help with tutorials?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.