Computer Vision at the Edge: Deploying YOLO Models on Cheap Hardware
Deploy YOLOv8 and YOLOv11 object detection models on Raspberry Pi, Jetson Nano, and budget hardware. Complete guide with optimization and benchmarks.
Why Edge Computer Vision?
Sending every camera frame to the cloud for processing means latency, bandwidth costs, and privacy concerns. Edge inference — running the model on the camera device itself — solves all three. A Raspberry Pi 5 running YOLOv8n processes 15-30 frames per second. That is fast enough for security cameras, retail analytics, quality inspection, and robotics.
A typical CI/CD pipeline: code flows through build, test, and deploy stages automatically.
Hardware Options and Their Capabilities
| Device | Price | FPS (YOLOv8n) | Power | Best For |
|---|---|---|---|---|
| Raspberry Pi 5 (8GB) | $80 | 15-25 | 5W | Prototyping, light duty |
| NVIDIA Jetson Nano | $150 | 30-45 | 10W | Production edge AI |
| Jetson Orin Nano | $250 | 60-80 | 15W | Multi-camera, real-time |
| Orange Pi 5 (RK3588) | $90 | 20-35 (NPU) | 8W | Budget production |
| Old laptop + GTX 1650 | $200 used | 80-120 | 60W | High throughput |
At TechSaaS, we have deployed edge vision systems on everything from Raspberry Pis to refurbished laptops with discrete GPUs. The GTX 1650 in our Proxmox server handles real-time inference beautifully.
Setting Up YOLOv8 on a Raspberry Pi 5
# Install system dependencies
sudo apt update && sudo apt install -y python3-pip libopencv-dev
# Install ultralytics (YOLO library)
pip3 install ultralytics opencv-python-headless
# Test with a sample image
yolo detect predict model=yolov8n.pt source=bus.jpg
For real-time camera inference:
from ultralytics import YOLO
import cv2
model = YOLO("yolov8n.pt") # Nano model, fastest
cap = cv2.VideoCapture(0) # USB camera or CSI camera
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
while True:
ret, frame = cap.read()
if not ret:
break
results = model(frame, verbose=False, conf=0.5)
# Draw bounding boxes
annotated = results[0].plot()
# Process detections
for box in results[0].boxes:
cls = int(box.cls[0])
conf = float(box.conf[0])
label = model.names[cls]
print(f"Detected: {label} ({conf:.2f})")
cv2.imshow("YOLO Edge", annotated)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
Get more insights on AI & Machine Learning
Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.
Model Optimization for Edge Devices
ONNX Export
Converting to ONNX typically gives a 1.5-2x speedup:
from ultralytics import YOLO
model = YOLO("yolov8n.pt")
model.export(format="onnx", imgsz=640, simplify=True, opset=12)
Then use the ONNX model:
model = YOLO("yolov8n.onnx")
results = model("image.jpg")
Docker Compose brings up your entire stack with a single command.
TensorRT for NVIDIA Devices
On Jetson or any NVIDIA GPU, TensorRT provides the best performance:
model = YOLO("yolov8n.pt")
model.export(format="engine", imgsz=640, half=True) # FP16 TensorRT
# Use the optimized engine
model = YOLO("yolov8n.engine")
results = model("image.jpg") # 2-3x faster than PyTorch
INT8 Quantization
For maximum speed on CPU, quantize to 8-bit integers:
model.export(format="onnx", imgsz=640, int8=True,
data="calibration_dataset.yaml")
This reduces model size by 4x and speeds up inference 2-3x with only a 1-2% accuracy drop.
Building a Complete Edge Vision Pipeline
Here is a production-ready pipeline that detects objects and sends alerts:
import time
import json
import requests
from ultralytics import YOLO
import cv2
class EdgeVisionPipeline:
def __init__(self, model_path, webhook_url, alert_classes=None):
self.model = YOLO(model_path)
self.webhook_url = webhook_url
self.alert_classes = alert_classes or ["person"]
self.last_alert = {}
self.cooldown = 30 # seconds between alerts per class
def should_alert(self, class_name: str) -> bool:
now = time.time()
last = self.last_alert.get(class_name, 0)
if now - last > self.cooldown:
self.last_alert[class_name] = now
return True
return False
def send_alert(self, detections: list):
payload = {
"timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
"device": "edge-cam-01",
"detections": detections
}
try:
requests.post(self.webhook_url, json=payload, timeout=5)
except requests.RequestException:
pass # Log but don't crash
def run(self, camera_id=0):
cap = cv2.VideoCapture(camera_id)
print(f"Starting edge vision on camera {camera_id}")
while True:
ret, frame = cap.read()
if not ret:
time.sleep(1)
continue
results = self.model(frame, verbose=False, conf=0.5)
alerts = []
for box in results[0].boxes:
label = self.model.names[int(box.cls[0])]
if label in self.alert_classes and self.should_alert(label):
alerts.append({
"class": label,
"confidence": round(float(box.conf[0]), 2)
})
if alerts:
self.send_alert(alerts)
# Usage
pipeline = EdgeVisionPipeline(
model_path="yolov8n.onnx",
webhook_url="https://n8n.techsaas.cloud/webhook/vision-alert",
alert_classes=["person", "car", "dog"]
)
pipeline.run(camera_id=0)
Training Custom Models
YOLO excels at custom object detection. To detect specific items (defective products, safety gear, specific vehicles):
Free Resource
Free Cloud Architecture Checklist
A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.
# Prepare dataset in YOLO format
# images/train/, images/val/
# labels/train/, labels/val/
# Train on a GPU machine (doesn't need to be the edge device)
yolo detect train data=custom_dataset.yaml model=yolov8n.pt epochs=100 imgsz=640
# Export optimized model for your edge device
yolo export model=runs/detect/train/weights/best.pt format=onnx
Label your data with tools like Label Studio (self-hostable) or Roboflow. 200-500 labeled images per class is usually sufficient for good results.
Power and Thermal Management
Edge devices run 24/7. Manage power and heat:
# Raspberry Pi: Monitor temperature
vcgencmd measure_temp
# Set up a watchdog to restart if GPU overheats
# /etc/systemd/system/vision-watchdog.service
[Service]
ExecStart=/usr/local/bin/vision-pipeline.py
Restart=always
RestartSec=10
Use a heatsink and fan. At 25 FPS continuous, a Pi 5 runs at 65-75C — well within safe limits with passive cooling.
Neural network architecture: data flows through input, hidden, and output layers.
Real-World Deployment Tips
- Process every Nth frame: If you need detection, not tracking, process every 3rd frame and get 3x longer battery life
- Region of interest: Crop the frame to only analyze relevant areas
- Model selection matters: YOLOv8n (nano) vs YOLOv8s (small) is a 3x speed difference for only 5% accuracy loss on most tasks
- Network resilience: Buffer alerts locally when WiFi drops, send when reconnected
- Remote model updates: Pull new models via HTTP without redeploying the application
Edge AI is one of the fastest-growing fields in tech. At TechSaaS, we help companies deploy computer vision solutions on cost-effective hardware — no cloud dependency, no per-inference fees, no data leaving the premises.
Related Service
Cloud Solutions
Let our experts help you build the right technology strategy for your business.
Need help with ai & machine learning?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.
We Will Build You a Demo Site — For Free
Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.
No spam. No contracts. Just a free demo.