← All articlesPlatform Engineering

Kubernetes Operators for Custom Resources: A Practical Guide

Learn how to build Kubernetes Operators that manage custom resources. From CRDs to Operator SDK, this guide covers reconciliation loops, status...

Y
Yash Pritwani
16 min read

What Are Kubernetes Operators?

Kubernetes Operators extend the Kubernetes API to manage complex, stateful applications using custom resources. Instead of writing shell scripts or manual runbooks, you encode operational knowledge into code that Kubernetes runs continuously.

API GatewayAuthServiceUserServiceOrderServicePaymentServiceMessage Bus / Events

Microservices architecture: independent services communicate through an API gateway and event bus.

An Operator watches for changes to Custom Resources (CRs) and reconciles the actual state of your cluster with the desired state you declared. Think of it as a robot SRE that never sleeps.

The Operator Pattern

The Operator pattern consists of three pieces:

  1. Custom Resource Definition (CRD): Extends the Kubernetes API with your own resource types
  2. Custom Resource (CR): An instance of your CRD — the desired state
  3. Controller: Code that watches CRs and reconciles actual state to match desired state
User creates CR → Controller watches → Reconcile loop runs → Actual state matches desired state

Building a CRD

Let us build an Operator that manages PostgreSQL databases. First, define the CRD:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.techsaas.cloud
spec:
  group: techsaas.cloud
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                engine:
                  type
                  enum: ["postgresql", "mysql", "mongodb"]
                version:
                  type
                storage:
                  type
                replicas:
                  type: integer
                  minimum: 1
                  maximum: 5
            status:
              type: object
              properties:
                phase:
                  type
                connectionString:
                  type
                readyReplicas:
                  type: integer
      subresources:
        status: {}
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames:
      - db

Get more insights on Platform Engineering

Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.

Now users can create databases declaratively:

apiVersion: techsaas.cloud/v1alpha1
kind: Database
metadata:
  name: my-app-db
  namespace: production
spec:
  engine: postgresql
  version: "16"
  storage: 10Gi
  replicas: 2

The Reconciliation Loop

The heart of every Operator is the reconcile function. Using the Operator SDK with Go:

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := log.FromContext(ctx)

    // 1. Fetch the Database CR
    var database v1alpha1.Database
    if err := r.Get(ctx, req.NamespacedName, &database); err != nil {
        if apierrors.IsNotFound(err) {
            // CR was deleted, clean up resources
            return ctrl.Result{}, nil
        }
        return ctrl.Result{}, err
    }

    // 2. Check if StatefulSet exists
    var sts appsv1.StatefulSet
    err := r.Get(ctx, types.NamespacedName{
        Name:      database.Name + "-db",
        Namespace: database.Namespace,
    }, &sts)

    if apierrors.IsNotFound(err) {
        // 3. Create StatefulSet if it does not exist
        sts = r.buildStatefulSet(&database)
        if err := r.Create(ctx, &sts); err != nil {
            return ctrl.Result{}, err
        }
        log.Info("Created StatefulSet", "name", sts.Name)
    }

    // 4. Create Service for the database
    if err := r.ensureService(ctx, &database); err != nil {
        return ctrl.Result{}, err
    }

    // 5. Create Secret with connection string
    if err := r.ensureSecret(ctx, &database); err != nil {
        return ctrl.Result{}, err
    }

    // 6. Update status
    database.Status.Phase = "Running"
    database.Status.ReadyReplicas = sts.Status.ReadyReplicas
    database.Status.ConnectionString = fmt.Sprintf(
        "postgresql://user:pass@%s-db.%s.svc:5432/app",
        database.Name, database.Namespace,
    )
    if err := r.Status().Update(ctx, &database); err != nil {
        return ctrl.Result{}, err
    }

    // 7. Requeue after 30 seconds for health check
    return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}

Key Reconciliation Patterns

Idempotency

Your reconcile function will be called many times. Every operation must be idempotent:

// BAD: Creates duplicate resources
func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) {
    r.Create(ctx, newService())  // Creates a new service every time
}

// GOOD: Check-then-create pattern
func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) {
    var svc corev1.Service
    err := r.Get(ctx, key, &svc)
    if apierrors.IsNotFound(err) {
        r.Create(ctx, newService())  // Only creates if missing
    }
}
OrchestratorNode 1Container AContainer BNode 2Container CContainer ANode 3Container BContainer D

Container orchestration distributes workloads across multiple nodes for resilience and scale.

Owner References

Set owner references so child resources are garbage-collected when the parent CR is deleted:

ctrl.SetControllerReference(&database, &statefulSet, r.Scheme)

Status Subresource

Always report status back to the user. This lets them run kubectl get databases and see the state:

NAME ENGINE REPLICAS PHASE AGE
my-app-db postgresql 2/2 Running 5m
analytics mongodb 1/1 Running 2d

Operator SDK vs Kubebuilder vs KOPF

Feature Operator SDK (Go) Kubebuilder KOPF (Python)
Language Go Go Python
Scaffolding Yes (full) Yes (full) Minimal
Helm/Ansible Yes No No
Maturity Production Production Mature
Learning curve Steep Steep Moderate
Performance Excellent Excellent Good
Best for Complex operators K8s-native Quick prototyping

For production operators managing critical resources, Go with Operator SDK is the standard choice. For internal tools and prototypes, KOPF with Python gets you running faster.

Scaffolding with Operator SDK

# Initialize project
operator-sdk init --domain techsaas.cloud --repo github.com/techsaas/db-operator

# Create API and controller
operator-sdk create api --group db --version v1alpha1 --kind Database --resource --controller

# Generate CRD manifests
make manifests

# Build and push
make docker-build docker-push IMG=registry.techsaas.cloud/db-operator:v1

# Deploy to cluster
make deploy IMG=registry.techsaas.cloud/db-operator:v1

Free Resource

Free Cloud Architecture Checklist

A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.

Download the Checklist

When To Build an Operator

Build an Operator when:

  • You manage stateful applications that need lifecycle automation
  • You have day-2 operations (backups, scaling, upgrades) that are currently manual
  • Multiple teams need self-service access to provision resources
  • You want to encode operational knowledge into version-controlled code

Do not build an Operator when:

  • A Helm chart with values is sufficient
  • The application is stateless and simple
  • You are the only operator and kubectl commands suffice
WebMobileIoTGatewayRate LimitAuthLoad BalanceTransformCacheService AService BService CDB / Cache

API gateway pattern: a single entry point handles auth, rate limiting, and routing to backend services.

Production Considerations

  1. RBAC: Your operator needs minimal permissions. Use a dedicated ServiceAccount
  2. Leader election: For HA, only one controller instance should reconcile at a time
  3. Metrics: Expose Prometheus metrics for reconcile duration, errors, and queue depth
  4. Finalizers: Use finalizers for cleanup tasks that must complete before CR deletion
  5. Webhook validation: Add admission webhooks to validate CRs before they are created

At TechSaaS, we build custom operators for clients who need platform-level automation. Whether it is database provisioning, certificate management, or environment cloning, operators turn manual toil into declarative infrastructure.

#kubernetes#operators#crd#platform-engineering#go

Related Service

Cloud Solutions

Let our experts help you build the right technology strategy for your business.

Need help with platform engineering?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.

We Will Build You a Demo Site — For Free

Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.

47+ companies trusted us
99.99% uptime
< 48hr response

No spam. No contracts. Just a free demo.