Building a Private AI Chatbot for Your Company with Ollama and Open WebUI
Deploy a fully private AI chatbot using Ollama and Open WebUI on your own servers. No data leaves your network. Complete setup guide with Docker Compose.
Why Your Company Needs a Private AI Chatbot
Every enterprise wants the productivity gains of ChatGPT. Few want their proprietary data flowing through OpenAI's servers. Internal chat logs, customer data, financial projections, and trade secrets have no business leaving your network perimeter.
Neural network architecture: data flows through input, hidden, and output layers.
The solution is straightforward: run a large language model locally using Ollama and give your team a polished web interface with Open WebUI. At TechSaaS, we deploy this stack for clients in a single afternoon.
The Architecture
The stack is deceptively simple:
- Ollama: Model runtime that manages downloading, quantizing, and serving LLMs
- Open WebUI: Feature-rich chat interface with multi-user support, conversation history, RAG, and model management
- Your hardware: Any machine with 16GB+ RAM (CPU inference) or an NVIDIA GPU (fast inference)
All traffic stays internal. Zero external API calls. Zero per-token costs after setup.
Hardware Requirements
For a team of 10-50 people with sequential queries:
| Model Size | RAM Needed | GPU VRAM | Speed |
|---|---|---|---|
| 7B (Mistral, Llama 3.1) | 8GB | 6GB | Fast |
| 13B (CodeLlama, Llama 2) | 16GB | 10GB | Good |
| 70B (Llama 3.1 70B) | 48GB | 40GB | Moderate |
A single NVIDIA RTX 4090 (24GB VRAM) handles 13B models with room to spare. For CPU-only, expect 5-10 tokens/second on a modern Ryzen or Xeon.
Get more insights on AI & Machine Learning
Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.
Docker Compose Setup
Create a docker-compose.yml:
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
networks:
- ai-net
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_AUTH=true
- WEBUI_SECRET_KEY=change-this-to-a-random-string
volumes:
- webui_data:/app/backend/data
ports:
- "3000:8080"
depends_on:
- ollama
networks:
- ai-net
volumes:
ollama_data:
webui_data:
networks:
ai-net:
driver: bridge
Start everything:
docker compose up -d
Pulling Your First Model
Once Ollama is running, pull a model:
# Fast and capable general-purpose model
docker exec ollama ollama pull llama3.1:8b
# Excellent for code generation
docker exec ollama ollama pull codellama:13b
# Small but fast for simple tasks
docker exec ollama ollama pull phi3:mini
Open WebUI at http://localhost:3000, create an admin account, and start chatting. Models appear automatically.
RAG architecture: user prompts are embedded, matched against a vector store, then fed to an LLM with retrieved context.
Adding RAG for Internal Documents
Open WebUI has built-in RAG (Retrieval-Augmented Generation). Upload PDFs, markdown files, or paste text into the knowledge base:
- Go to Workspace > Knowledge
- Create a new collection (e.g., "Engineering Docs")
- Upload your documents
- In any chat, click the + button and attach the knowledge base
- The model now answers questions using your documents as context
For larger document sets, configure an external vector database:
# Add to docker-compose.yml
chromadb:
image: chromadb/chroma:latest
container_name: chromadb
volumes:
- chroma_data:/chroma/chroma
networks:
- ai-net
Then point Open WebUI's RAG settings to http://chromadb:8000.
Putting It Behind Authentication
If you are running Traefik and Authelia (like we do at TechSaaS), add labels to the Open WebUI service:
labels:
- "traefik.enable=true"
- "traefik.http.routers.ai-chat.rule=Host(`chat.internal.company.com`)"
- "traefik.http.routers.ai-chat.middlewares=authelia@file"
- "traefik.http.services.ai-chat.loadbalancer.server.port=8080"
This ensures only authenticated employees can access the chatbot while keeping it off the public internet.
Performance Tuning
A few tips to maximize throughput:
- Enable GPU acceleration: Ensure the NVIDIA Container Toolkit is installed and the
deployblock is present in your compose file - Use quantized models: 4-bit quantized (Q4_K_M) models run 2-3x faster with minimal quality loss
- Set context limits:
OLLAMA_NUM_CTX=4096prevents memory exhaustion on long conversations - Monitor VRAM:
nvidia-smi -l 1watches GPU memory in real time
Free Resource
Free Cloud Architecture Checklist
A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.
# Check model memory usage
docker exec ollama ollama ps
# See available models
docker exec ollama ollama list
Cost Comparison
Running Llama 3.1 8B locally for a 50-person team:
- Hardware: ~$300/month (dedicated GPU server) or $0 if you own the hardware
- Per-query cost: $0.00
- Monthly queries: Unlimited
Equivalent with OpenAI GPT-4o:
- 50 users x 100 queries/day x 1000 tokens avg: ~$3,000/month
- Plus data privacy concerns: Priceless risk
The self-hosted option pays for itself in month one.
ML pipeline: from raw data collection through training, evaluation, deployment, and continuous monitoring.
What We Deploy for Clients
At TechSaaS, our private AI chatbot package includes Ollama, Open WebUI, ChromaDB for RAG, Traefik routing, Authelia authentication, automated backups, and monitoring via Grafana. The entire stack runs on a single server alongside your other services. Contact [email protected] to get started.
Related Service
Cloud Solutions
Let our experts help you build the right technology strategy for your business.
Need help with ai & machine learning?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.
We Will Build You a Demo Site — For Free
Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.
No spam. No contracts. Just a free demo.