← All articlesAI & Machine Learning

Building a Private AI Chatbot for Your Company with Ollama and Open WebUI

Deploy a fully private AI chatbot using Ollama and Open WebUI on your own servers. No data leaves your network. Complete setup guide with Docker Compose.

Yash Pritwani

1 November 202514 min read

Why Your Company Needs a Private AI Chatbot

Every enterprise wants the productivity gains of ChatGPT. Few want their proprietary data flowing through OpenAI's servers. Internal chat logs, customer data, financial projections, and trade secrets have no business leaving your network perimeter.

Neural network architecture: data flows through input, hidden, and output layers.

The solution is straightforward: run a large language model locally using Ollama and give your team a polished web interface with Open WebUI. At TechSaaS, we deploy this stack for clients in a single afternoon.

The Architecture

The stack is deceptively simple:

Ollama: Model runtime that manages downloading, quantizing, and serving LLMs
Open WebUI: Feature-rich chat interface with multi-user support, conversation history, RAG, and model management
Your hardware: Any machine with 16GB+ RAM (CPU inference) or an NVIDIA GPU (fast inference)

All traffic stays internal. Zero external API calls. Zero per-token costs after setup.

Hardware Requirements

For a team of 10-50 people with sequential queries:

Model Size	RAM Needed	GPU VRAM	Speed
7B (Mistral, Llama 3.1)	8GB	6GB	Fast
13B (CodeLlama, Llama 2)	16GB	10GB	Good
70B (Llama 3.1 70B)	48GB	40GB	Moderate

A single NVIDIA RTX 4090 (24GB VRAM) handles 13B models with room to spare. For CPU-only, expect 5-10 tokens/second on a modern Ryzen or Xeon.

Get more insights on AI & Machine Learning

Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.

Docker Compose Setup

Create a docker-compose.yml:

version: '3.8'
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    networks:
      - ai-net

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_AUTH=true
      - WEBUI_SECRET_KEY=change-this-to-a-random-string
    volumes:
      - webui_data:/app/backend/data
    ports:
      - "3000:8080"
    depends_on:
      - ollama
    networks:
      - ai-net

volumes:
  ollama_data:
  webui_data:

networks:
  ai-net:
    driver: bridge

Start everything:

docker compose up -d

Pulling Your First Model

Once Ollama is running, pull a model:

# Fast and capable general-purpose model
docker exec ollama ollama pull llama3.1:8b

# Excellent for code generation
docker exec ollama ollama pull codellama:13b

# Small but fast for simple tasks
docker exec ollama ollama pull phi3:mini

Open WebUI at http://localhost:3000, create an admin account, and start chatting. Models appear automatically.

RAG architecture: user prompts are embedded, matched against a vector store, then fed to an LLM with retrieved context.

Adding RAG for Internal Documents

→

Running LLMs Locally: A DevOps Guide to Self-Hosted AI in 202610 min read read

→

Running Production AI Agents: Infrastructure Patterns That Actually Scale11 min read read

→

Building an AI Screening Pipeline With Embeddings12 min read read

Open WebUI has built-in RAG (Retrieval-Augmented Generation). Upload PDFs, markdown files, or paste text into the knowledge base:

Go to Workspace > Knowledge
Create a new collection (e.g., "Engineering Docs")
Upload your documents
In any chat, click the + button and attach the knowledge base
The model now answers questions using your documents as context

For larger document sets, configure an external vector database:

# Add to docker-compose.yml
chromadb:
  image: chromadb/chroma:latest
  container_name: chromadb
  volumes:
    - chroma_data:/chroma/chroma
  networks:
    - ai-net

Then point Open WebUI's RAG settings to http://chromadb:8000.

Putting It Behind Authentication

If you are running Traefik and Authelia (like we do at TechSaaS), add labels to the Open WebUI service:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.ai-chat.rule=Host(`chat.internal.company.com`)"
  - "traefik.http.routers.ai-chat.middlewares=authelia@file"
  - "traefik.http.services.ai-chat.loadbalancer.server.port=8080"

This ensures only authenticated employees can access the chatbot while keeping it off the public internet.

Performance Tuning

A few tips to maximize throughput:

Enable GPU acceleration: Ensure the NVIDIA Container Toolkit is installed and the deploy block is present in your compose file
Use quantized models: 4-bit quantized (Q4_K_M) models run 2-3x faster with minimal quality loss
Set context limits: OLLAMA_NUM_CTX=4096 prevents memory exhaustion on long conversations
Monitor VRAM: nvidia-smi -l 1 watches GPU memory in real time

Free Resource

Free Cloud Architecture Checklist

A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.

Download the Checklist

# Check model memory usage
docker exec ollama ollama ps

# See available models
docker exec ollama ollama list

Cost Comparison

Running Llama 3.1 8B locally for a 50-person team:

Hardware: ~$300/month (dedicated GPU server) or $0 if you own the hardware
Per-query cost: $0.00
Monthly queries: Unlimited

Equivalent with OpenAI GPT-4o:

50 users x 100 queries/day x 1000 tokens avg: ~$3,000/month
Plus data privacy concerns: Priceless risk

The self-hosted option pays for itself in month one.

ML pipeline: from raw data collection through training, evaluation, deployment, and continuous monitoring.

What We Deploy for Clients

At TechSaaS, our private AI chatbot package includes Ollama, Open WebUI, ChromaDB for RAG, Traefik routing, Authelia authentication, automated backups, and monitoring via Grafana. The entire stack runs on a single server alongside your other services. Contact [email protected] to get started.

#ollama#open-webui#private-ai#self-hosted#llm#docker

Related Service

Cloud Solutions

Let our experts help you build the right technology strategy for your business.

Get a Consultation Chat on WhatsApp

Need help with ai & machine learning?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.

Get a Free Consultation WhatsApp Us

We Will Build You a Demo Site — For Free

Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.

47+ companies trusted us

99.99% uptime

< 48hr response

No spam. No contracts. Just a free demo.