Kubernetes Operators for Custom Resources: A Practical Guide
Learn how to build Kubernetes Operators that manage custom resources. From CRDs to Operator SDK, this guide covers reconciliation loops, status...
What Are Kubernetes Operators?
Kubernetes Operators extend the Kubernetes API to manage complex, stateful applications using custom resources. Instead of writing shell scripts or manual runbooks, you encode operational knowledge into code that Kubernetes runs continuously.
Microservices architecture: independent services communicate through an API gateway and event bus.
An Operator watches for changes to Custom Resources (CRs) and reconciles the actual state of your cluster with the desired state you declared. Think of it as a robot SRE that never sleeps.
The Operator Pattern
The Operator pattern consists of three pieces:
- Custom Resource Definition (CRD): Extends the Kubernetes API with your own resource types
- Custom Resource (CR): An instance of your CRD — the desired state
- Controller: Code that watches CRs and reconciles actual state to match desired state
User creates CR → Controller watches → Reconcile loop runs → Actual state matches desired state
Building a CRD
Let us build an Operator that manages PostgreSQL databases. First, define the CRD:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.techsaas.cloud
spec:
group: techsaas.cloud
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
engine:
type
enum: ["postgresql", "mysql", "mongodb"]
version:
type
storage:
type
replicas:
type: integer
minimum: 1
maximum: 5
status:
type: object
properties:
phase:
type
connectionString:
type
readyReplicas:
type: integer
subresources:
status: {}
scope: Namespaced
names:
plural: databases
singular: database
kind: Database
shortNames:
- db
Get more insights on Platform Engineering
Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.
Now users can create databases declaratively:
apiVersion: techsaas.cloud/v1alpha1
kind: Database
metadata:
name: my-app-db
namespace: production
spec:
engine: postgresql
version: "16"
storage: 10Gi
replicas: 2
The Reconciliation Loop
The heart of every Operator is the reconcile function. Using the Operator SDK with Go:
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
// 1. Fetch the Database CR
var database v1alpha1.Database
if err := r.Get(ctx, req.NamespacedName, &database); err != nil {
if apierrors.IsNotFound(err) {
// CR was deleted, clean up resources
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}
// 2. Check if StatefulSet exists
var sts appsv1.StatefulSet
err := r.Get(ctx, types.NamespacedName{
Name: database.Name + "-db",
Namespace: database.Namespace,
}, &sts)
if apierrors.IsNotFound(err) {
// 3. Create StatefulSet if it does not exist
sts = r.buildStatefulSet(&database)
if err := r.Create(ctx, &sts); err != nil {
return ctrl.Result{}, err
}
log.Info("Created StatefulSet", "name", sts.Name)
}
// 4. Create Service for the database
if err := r.ensureService(ctx, &database); err != nil {
return ctrl.Result{}, err
}
// 5. Create Secret with connection string
if err := r.ensureSecret(ctx, &database); err != nil {
return ctrl.Result{}, err
}
// 6. Update status
database.Status.Phase = "Running"
database.Status.ReadyReplicas = sts.Status.ReadyReplicas
database.Status.ConnectionString = fmt.Sprintf(
"postgresql://user:pass@%s-db.%s.svc:5432/app",
database.Name, database.Namespace,
)
if err := r.Status().Update(ctx, &database); err != nil {
return ctrl.Result{}, err
}
// 7. Requeue after 30 seconds for health check
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}
Key Reconciliation Patterns
Idempotency
Your reconcile function will be called many times. Every operation must be idempotent:
// BAD: Creates duplicate resources
func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) {
r.Create(ctx, newService()) // Creates a new service every time
}
// GOOD: Check-then-create pattern
func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) {
var svc corev1.Service
err := r.Get(ctx, key, &svc)
if apierrors.IsNotFound(err) {
r.Create(ctx, newService()) // Only creates if missing
}
}
Container orchestration distributes workloads across multiple nodes for resilience and scale.
You might also like
Owner References
Set owner references so child resources are garbage-collected when the parent CR is deleted:
ctrl.SetControllerReference(&database, &statefulSet, r.Scheme)
Status Subresource
Always report status back to the user. This lets them run kubectl get databases and see the state:
| NAME | ENGINE | REPLICAS | PHASE | AGE |
|---|---|---|---|---|
| my-app-db | postgresql | 2/2 | Running | 5m |
| analytics | mongodb | 1/1 | Running | 2d |
Operator SDK vs Kubebuilder vs KOPF
| Feature | Operator SDK (Go) | Kubebuilder | KOPF (Python) |
|---|---|---|---|
| Language | Go | Go | Python |
| Scaffolding | Yes (full) | Yes (full) | Minimal |
| Helm/Ansible | Yes | No | No |
| Maturity | Production | Production | Mature |
| Learning curve | Steep | Steep | Moderate |
| Performance | Excellent | Excellent | Good |
| Best for | Complex operators | K8s-native | Quick prototyping |
For production operators managing critical resources, Go with Operator SDK is the standard choice. For internal tools and prototypes, KOPF with Python gets you running faster.
Scaffolding with Operator SDK
# Initialize project
operator-sdk init --domain techsaas.cloud --repo github.com/techsaas/db-operator
# Create API and controller
operator-sdk create api --group db --version v1alpha1 --kind Database --resource --controller
# Generate CRD manifests
make manifests
# Build and push
make docker-build docker-push IMG=registry.techsaas.cloud/db-operator:v1
# Deploy to cluster
make deploy IMG=registry.techsaas.cloud/db-operator:v1
Free Resource
Free Cloud Architecture Checklist
A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.
When To Build an Operator
Build an Operator when:
- You manage stateful applications that need lifecycle automation
- You have day-2 operations (backups, scaling, upgrades) that are currently manual
- Multiple teams need self-service access to provision resources
- You want to encode operational knowledge into version-controlled code
Do not build an Operator when:
- A Helm chart with values is sufficient
- The application is stateless and simple
- You are the only operator and kubectl commands suffice
API gateway pattern: a single entry point handles auth, rate limiting, and routing to backend services.
Production Considerations
- RBAC: Your operator needs minimal permissions. Use a dedicated ServiceAccount
- Leader election: For HA, only one controller instance should reconcile at a time
- Metrics: Expose Prometheus metrics for reconcile duration, errors, and queue depth
- Finalizers: Use finalizers for cleanup tasks that must complete before CR deletion
- Webhook validation: Add admission webhooks to validate CRs before they are created
At TechSaaS, we build custom operators for clients who need platform-level automation. Whether it is database provisioning, certificate management, or environment cloning, operators turn manual toil into declarative infrastructure.
Related Service
Cloud Solutions
Let our experts help you build the right technology strategy for your business.
Need help with platform engineering?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.
We Will Build You a Demo Site — For Free
Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.
No spam. No contracts. Just a free demo.