Apache Kafka for Beginners: Stream Processing in 2026

Learn Apache Kafka from scratch. Producers, consumers, topics, partitions, and stream processing explained with Docker setup and real code examples.

Y
Yash Pritwani
14 min read

What Is Apache Kafka?

Apache Kafka is a distributed event streaming platform. Think of it as an extremely fast, durable, append-only log that multiple applications can write to and read from simultaneously. It handles millions of events per second and stores them reliably across a cluster.

docker-compose.yml123456789version: "3.8"services: web: image: nginx:alpine ports: - "80:80" volumes: - ./html:/usr/share/nginx

A well-structured configuration file is the foundation of reproducible infrastructure.

If you have ever needed to move data between systems in real time — user events to analytics, orders to fulfillment, logs to monitoring — Kafka is the tool.

Core Concepts

Topics

A topic is a named stream of events. Think of it as a database table, but append-only. You cannot update or delete events; you can only add new ones.

Topic: "user-events"
  Event 1: {user: "alice", action: "login", timestamp: "..."}
  Event 2: {user: "bob", action: "purchase", timestamp: "..."}
  Event 3: {user: "alice", action: "logout", timestamp: "..."}

Partitions

Topics are split into partitions for parallelism. Events with the same key always go to the same partition (guaranteeing order per key).

Topic: "orders" (3 partitions)
  Partition 0: orders for customer IDs 0-999
  Partition 1: orders for customer IDs 1000-1999
  Partition 2: orders for customer IDs 2000+

Get more insights on Tutorials

Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.

Producers and Consumers

Producers write events. Consumers read events. Consumer groups allow parallel processing — each consumer in a group reads from different partitions.

Docker Setup

Get Kafka running in 60 seconds with KRaft mode (no Zookeeper needed since Kafka 3.3):

# docker-compose.yml
version: '3.8'
services:
  kafka:
    image: apache/kafka:3.7.0
    container_name: kafka
    ports:
      - "9092:9092"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_LOG_RETENTION_HOURS: 168  # 7 days
    volumes:
      - kafka_data:/var/lib/kafka/data

  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    container_name: kafka-ui
    ports:
      - "8080:8080"
    environment:
      KAFKA_CLUSTERS_0_NAME: local
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092
    depends_on:
      - kafka

volumes:
  kafka_data:
docker compose up -d
# Kafka UI available at http://localhost:8080

Producing Events (Python)

from confluent_kafka import Producer
import json
import time

producer = Producer({
    'bootstrap.servers': 'localhost:9092',
    'client.id': 'order-service'
})

def delivery_callback(err, msg):
    if err:
        print(f"Delivery failed: {err}")
    else:
        print(f"Delivered to {msg.topic()} [{msg.partition()}] @ {msg.offset()}")

# Produce events
for i in range(100):
    event = {
        "order_id": f"ORD-{i:04d}",
        "customer_id": f"CUST-{i % 10:03d}",
        "amount": round(10 + i * 1.5, 2),
        "timestamp": time.time()
    }

    producer.produce(
        topic="orders",
        key=event["customer_id"],  # Same customer → same partition
        value=json.dumps(event),
        callback=delivery_callback
    )

producer.flush()  # Wait for all deliveries
print("All events produced")
Terminal$docker compose up -d[+] Running 5/5Network app_default CreatedContainer web StartedContainer api StartedContainer db Started$

Docker Compose brings up your entire stack with a single command.

Consuming Events (Python)

from confluent_kafka import Consumer
import json

consumer = Consumer({
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'order-processor',
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': False  # Manual commit for reliability
})

consumer.subscribe(['orders'])

try:
    while True:
        msg = consumer.poll(timeout=1.0)
        if msg is None:
            continue
        if msg.error():
            print(f"Error: {msg.error()}")
            continue

        event = json.loads(msg.value())
        print(f"Processing order {event['order_id']}: {event['amount']}")

        # Process the event (save to DB, trigger fulfillment, etc.)
        process_order(event)

        # Commit offset after successful processing
        consumer.commit(asynchronous=False)

except KeyboardInterrupt:
    pass
finally:
    consumer.close()

Producing Events (Node.js)

import { Kafka } from 'kafkajs';

const kafka = new Kafka({
  clientId: 'notification-service',
  brokers: ['localhost:9092']
});

const producer = kafka.producer();

async function sendNotification(userId, message) {
  await producer.connect();
  await producer.send({
    topic: 'notifications',
    messages: [
      {
        key: userId,
        value: JSON.stringify({
          userId,
          message,
          channel: 'email',
          timestamp: new Date().toISOString()
        })
      }
    ]
  });
}

// Consumer
const consumer = kafka.consumer({ groupId: 'email-sender' });

async function startEmailConsumer() {
  await consumer.connect();
  await consumer.subscribe({ topic: 'notifications', fromBeginning: true });

  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      const notification = JSON.parse(message.value!.toString());
      console.log(`Sending email to ${notification.userId}: ${notification.message}`);
      await sendEmail(notification);
    }
  });
}

Common Patterns

Event Sourcing

Store every state change as an event. Reconstruct current state by replaying events:

Topic: "account-events"
  {type: "CREATED", accountId: "A1", balance: 0}
  {type: "DEPOSITED", accountId: "A1", amount: 1000}
  {type: "WITHDRAWN", accountId: "A1", amount: 200}
  // Current balance: replay → 0 + 1000 - 200 = 800

CQRS (Command Query Responsibility Segregation)

Write to Kafka, consume into read-optimized stores:

Write path: API → Kafka "orders" topic
Read path: Kafka consumer → PostgreSQL → API query

Dead Letter Queue

Free Resource

Free Cloud Architecture Checklist

A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.

Download the Checklist

Failed messages go to a separate topic for retry or investigation:

def process_with_dlq(msg):
    try:
        process_order(json.loads(msg.value()))
    except Exception as e:
        # Send to dead letter queue
        producer.produce(
            topic="orders-dlq",
            key=msg.key(),
            value=msg.value(),
            headers=[("error", str(e).encode())]
        )

Performance Tuning

Key settings for production:

# Producer: batch messages for throughput
batch.size=65536
linger.ms=10
compression.type=lz4

# Consumer: fetch more data per poll
fetch.min.bytes=1024
fetch.max.wait.ms=500
max.poll.records=500

# Broker: retention and segment settings
log.retention.hours=168
log.segment.bytes=1073741824
num.partitions=12
docker-compose.ymlWeb AppAPI ServerDatabaseCacheDocker Network:3000:8080:5432:6379

Docker Compose defines your entire application stack in a single YAML file.

When to Use Kafka

Use Kafka when:

  • You need real-time event processing across multiple services
  • You need event replay capability
  • You have high throughput requirements (>10K events/sec)
  • You need durable, ordered event streams

Skip Kafka when:

  • Simple request-response patterns (use HTTP)
  • Low volume messaging (<100 events/sec, use Redis Pub/Sub)
  • You need complex routing (use RabbitMQ)
  • Your team is small and cannot operate a distributed system

At TechSaaS, we deploy Kafka for clients with genuine streaming needs and recommend simpler alternatives when the scale does not warrant it. The right tool for the right job.

#kafka#stream-processing#event-driven#docker#messaging#tutorial

Related Service

Cloud Solutions

Let our experts help you build the right technology strategy for your business.

Need help with tutorials?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.

We Will Build You a Demo Site — For Free

Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.

47+ companies trusted us
99.99% uptime
< 48hr response

No spam. No contracts. Just a free demo.