Zero-Trust Security Architecture for Cloud-Native Apps

A hands-on guide to implementing Zero Trust in Kubernetes-based cloud-native applications. Covers service mesh mTLS, SPIFFE identity, OPA policy...

T
TechSaaS Team
13 min read

The Perimeter Doesn't Exist Anymore

In a traditional data center, security meant a firewall at the edge. Traffic inside the network was trusted. This worked when all your services ran on known machines in a known building.

FirewallWAFSSO / MFATLS/SSLRBACAudit Logs

Defense in depth: multiple security layers protect your infrastructure from threats.

In a cloud-native world running on Kubernetes, this model collapses:

  • Pods are ephemeral: A pod's IP address changes every time it restarts. Firewall rules based on IP are meaningless.
  • Clusters are multi-tenant: Multiple teams, services, and environments share infrastructure.
  • The network is hostile: East-west traffic between services can be intercepted. A compromised pod can attack its neighbors.
  • Supply chain attacks: A single compromised container image can provide an attacker with a foothold inside your cluster.

Zero Trust architecture assumes that every network request — even internal ones — might be malicious. Every request must be authenticated, authorized, and encrypted, regardless of where it originates.

Here's how to implement it in cloud-native applications.

The Five Pillars of Cloud-Native Zero Trust

┌─────────────────────────────────────────────────────┐
│            Cloud-Native Zero Trust                    │
├────────────┬────────────┬────────────┬──────────────┤
│  Identity  │  Network   │   Policy   │   Secrets    │
│            │            │            │              │
│  SPIFFE/   │  mTLS      │  OPA/Gat   │  Vault/ESO  │
│  SPIRE     │  Network   │  ekeeper   │              │
│            │  Policies  │            │              │
├────────────┴────────────┴────────────┴──────────────┤
│              Observability & Audit                    │
│  Falco │ Audit Logs │ Network Flow Logs              │
└─────────────────────────────────────────────────────┘

Pillar 1: Workload Identity with SPIFFE/SPIRE

SPIFFE (Secure Production Identity Framework For Everyone) provides cryptographic identity to every workload without relying on network location.

How SPIFFE Works

Every workload gets a SPIFFE ID:

spiffe://techsaas.cloud/ns/production/sa/payment-service
  │              │           │              │
  protocol    trust domain  namespace    service account

SPIRE (the SPIFFE Runtime Environment) issues short-lived X.509 certificates to workloads:

# SPIRE Server deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spire-server
  namespace: spire
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: spire-server
          image: ghcr.io/spiffe/spire-server:1.9.0
          args:
            - -config
            - /run/spire/config/server.conf
          volumeMounts:
            - name: spire-config
              mountPath: /run/spire/config
            - name: spire-data
              mountPath: /run/spire/data
# server.conf — SPIRE Server configuration
server {
    bind_address = "0.0.0.0"
    bind_port = "8081"
    trust_domain = "techsaas.cloud"
    data_dir = "/run/spire/data"
    log_level = "INFO"
    ca_ttl = "24h"
    default_x509_svid_ttl = "1h"  # Short-lived certificates
}

plugins {
    DataStore "sql" {
        plugin_data {
            database_type = "postgres"
            connection_string = "dbname=spire host=postgres user=spire"
        }
    }

    NodeAttestor "k8s_psat" {
        plugin_data {
            clusters = {
                "production" = {
                    service_account_allow_list = ["spire:spire-agent"]
                }
            }
        }
    }

    KeyManager "disk" {
        plugin_data {
            keys_path = "/run/spire/data/keys.json"
        }
    }
}

Get more insights on Security

Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.

Certificates are automatically rotated every hour. If a workload is compromised, the blast radius is limited to the certificate's remaining TTL.

Registering Workloads

# Register the payment service identity
spire-server entry create \
    -spiffeID spiffe://techsaas.cloud/ns/production/sa/payment-service \
    -parentID spiffe://techsaas.cloud/ns/spire/sa/spire-agent \
    -selector k8s:ns:production \
    -selector k8s:sa:payment-service

# Register the order service identity
spire-server entry create \
    -spiffeID spiffe://techsaas.cloud/ns/production/sa/order-service \
    -parentID spiffe://techsaas.cloud/ns/spire/sa/spire-agent \
    -selector k8s:ns:production \
    -selector k8s:sa:order-service

Selectors bind SPIFFE IDs to Kubernetes attributes (namespace, service account, pod labels). A workload gets its identity only if it matches the registered selectors.

Pillar 2: mTLS Everywhere with Service Mesh

mutual TLS ensures every service-to-service call is encrypted and authenticated in both directions.

Istio Service Mesh Configuration

# Enforce mTLS cluster-wide
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT  # Reject any non-mTLS traffic

---
# Authorization policy: payment-service can only be called by order-service
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: payment-service-policy
  namespace: production
spec:
  selector:
    matchLabels:
      app: payment-service
  action: ALLOW
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/order-service"
              - "cluster.local/ns/production/sa/checkout-service"
      to:
        - operation:
            methods: ["POST"]
            paths: ["/api/v1/payments/*"]
    - from:
        - source:
            principals:
              - "cluster.local/ns/monitoring/sa/prometheus"
      to:
        - operation:
            methods: ["GET"]
            paths: ["/metrics"]

This configuration:

  1. Requires mTLS for all communication cluster-wide
  2. Allows only order-service and checkout-service to call the payment API
  3. Allows only Prometheus to scrape metrics
  4. Denies everything else by default

Cilium (eBPF-Based Alternative)

For teams wanting service mesh security without sidecar proxies:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: payment-service-l7
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: payment-service
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: order-service
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: POST
                path: "/api/v1/payments/.*"
    - fromEndpoints:
        - matchLabels:
            app: prometheus
      toPorts:
        - ports:
            - port: "9090"
              protocol: TCP
          rules:
            http:
              - method: GET
                path: "/metrics"

Cilium enforces L7 (HTTP) policies at the kernel level using eBPF, without sidecar containers. Lower overhead, same security guarantees.

Pillar 3: Policy Enforcement with OPA/Gatekeeper

Open Policy Agent (OPA) provides fine-grained policy enforcement across the entire stack.

Kubernetes Admission Policies

# Gatekeeper constraint template: require non-root containers
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequirednonroot
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredNonRoot
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequirednonroot

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.securityContext.runAsNonRoot
          msg := sprintf("Container %v must set securityContext.runAsNonRoot=true", [container.name])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          container.securityContext.runAsUser == 0
          msg := sprintf("Container %v must not run as root (UID 0)", [container.name])
        }

---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredNonRoot
metadata:
  name: require-non-root
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    namespaces:
      - production
      - staging
UserIdentityVerifyPolicyEngineAccessProxyAppMFA + DeviceLeast PrivilegeEncrypted TunnelNever Trust, Always Verify

Zero Trust architecture: every request is verified through identity, policy, and access proxy layers.

API-Level Authorization

OPA can also enforce fine-grained API authorization:

# policy.rego — API authorization
package authz

import rego.v1

default allow := false

# Allow authenticated users to read their own data
allow if {
    input.method == "GET"
    input.path = ["api", "v1", "users", user_id]
    input.user.id == user_id
}

# Allow admins to read any user data
allow if {
    input.method == "GET"
    startswith(input.path[0], "api")
    "admin" in input.user.roles
}

# Allow payment-service to create charges
allow if {
    input.method == "POST"
    input.path == ["api", "v1", "payments"]
    input.caller.spiffe_id == "spiffe://techsaas.cloud/ns/production/sa/order-service"
}

# Deny requests outside business hours for non-critical operations
allow if {
    input.method in ["DELETE", "PUT"]
    time.clock(time.now_ns())[0] >= 8   # After 8 AM
    time.clock(time.now_ns())[0] < 22   # Before 10 PM
}

Integrate OPA with your API gateway or as a sidecar:

import requests
from functools import wraps
from flask import request, abort

OPA_URL = "http://localhost:8181/v1/data/authz/allow"

def require_authz(f):
    @wraps(f)
    def decorated(*args, **kwargs):
        opa_input = {
            "input": {
                "method": request.method,
                "path": request.path.strip("/").split("/"),
                "user": get_user_from_jwt(request),
                "caller": get_spiffe_id(request),
            }
        }
        result = requests.post(OPA_URL, json=opa_input).json()
        if not result.get("result", False):
            abort(403, "Access denied by policy")
        return f(*args, **kwargs)
    return decorated

Pillar 4: Secrets Management

HashiCorp Vault with Kubernetes

# External Secrets Operator — sync Vault secrets to Kubernetes
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: vault-backend
  namespace: production
spec:
  provider:
    vault:
      server: "http://vault.vault:8200"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "production-apps"

---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: payment-service-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: SecretStore
  target:
    name: payment-service-secrets
    creationPolicy: Owner
  data:
    - secretKey: DB_PASSWORD
      remoteRef:
        key: production/payment-service
        property: db_password
    - secretKey: STRIPE_API_KEY
      remoteRef:
        key: production/payment-service
        property: stripe_api_key

Secrets are never stored in Kubernetes manifests or environment variable definitions. They're synced from Vault at runtime and automatically rotated.

Pod Security Standards

# Enforce restricted pod security
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

---
# Compliant pod spec
apiVersion: v1
kind: Pod
metadata:
  name: payment-service
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: app
      image: registry.techsaas.cloud/payment-service:v2.1.0@sha256:abc123...
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop: ["ALL"]
      resources:
        limits:
          cpu: "500m"
          memory: "256Mi"
        requests:
          cpu: "100m"
          memory: "128Mi"
      volumeMounts:
        - name: tmp
          mountPath: /tmp
  volumes:
    - name: tmp
      emptyDir:
        sizeLimit: 100Mi

Security features:

  • readOnlyRootFilesystem: Prevents writing to the container filesystem (limits attacker capability)
  • capabilities.drop: ["ALL"]: Removes all Linux capabilities
  • allowPrivilegeEscalation: false: Prevents gaining additional privileges
  • seccompProfile: RuntimeDefault: Restricts system calls
  • Image pinned by digest (@sha256:...): Prevents supply chain substitution
  • Resource limits: Prevents resource exhaustion attacks

Pillar 5: Observability and Audit

Free Resource

Infrastructure Security Audit Template

The exact audit template we use with clients: 60+ checks across network, identity, secrets management, and compliance.

Get the Template

Runtime Threat Detection with Falco

# Falco rules for Zero Trust violation detection
- rule: Unexpected Network Connection
  desc: Detect connections to unexpected external endpoints
  condition: >
    evt.type in (connect, sendto) and
    fd.net != "127.0.0.0/8" and
    fd.net != "10.0.0.0/8" and
    fd.net != "172.16.0.0/12" and
    k8s.ns.name = "production"
  output: >
    Unexpected outbound connection
    (pod=%k8s.pod.name namespace=%k8s.ns.name
     dest=%fd.name user=%user.name command=%proc.cmdline)
  priority: WARNING

- rule: Shell in Container
  desc: Detect interactive shell in production containers
  condition: >
    spawned_process and
    container and
    proc.name in (bash, sh, zsh, dash) and
    proc.tty != 0 and
    k8s.ns.name = "production"
  output: >
    Shell spawned in production container
    (pod=%k8s.pod.name command=%proc.cmdline user=%user.name)
  priority: CRITICAL

- rule: Sensitive File Access
  desc: Detect access to sensitive paths
  condition: >
    open_read and
    container and
    fd.name startswith /etc/shadow or
    fd.name startswith /etc/kubernetes or
    fd.name startswith /var/run/secrets
  output: >
    Sensitive file accessed
    (pod=%k8s.pod.name file=%fd.name command=%proc.cmdline)
  priority: CRITICAL

Kubernetes Audit Logging

# Audit policy — log security-relevant events
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # Log all changes to secrets
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["secrets"]
    verbs: ["create", "update", "patch", "delete"]

  # Log all RBAC changes
  - level: RequestResponse
    resources:
      - group: "rbac.authorization.k8s.io"
        resources: ["clusterroles", "clusterrolebindings", "roles", "rolebindings"]

  # Log exec into pods
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["pods/exec", "pods/attach"]

  # Log service account token creation
  - level: Metadata
    resources:
      - group: ""
        resources: ["serviceaccounts/token"]

Network Policies: The First Step

If you implement nothing else, implement network policies. They're built into Kubernetes and provide immediate value:

# Default deny all ingress and egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

---
# Allow specific communication patterns
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: payment-service-network
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: payment-service
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: order-service
      ports:
        - port: 8080
          protocol: TCP
  egress:
    # Allow database access
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - port: 5432
          protocol: TCP
    # Allow DNS
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP

Start with default-deny, then explicitly allow required communication paths. This immediately prevents lateral movement from compromised pods.

Implementation Priority

Don't try to implement everything at once. Follow this order:

Week 1-2: Network Policies
  - Default deny in production namespace
  - Allow known communication patterns
  - Immediate lateral movement prevention

Week 3-4: Pod Security Standards
  - Enforce restricted policy on production namespace
  - Non-root containers, read-only filesystems
  - Drop all capabilities

Month 2: mTLS with Service Mesh or Cilium
  - Encrypt all east-west traffic
  - Service identity authentication
  - L7 authorization policies

Month 3: Secrets Management
  - External Secrets Operator + Vault
  - Rotate all static secrets
  - Remove secrets from environment variables

Month 4: Policy Enforcement
  - OPA/Gatekeeper admission policies
  - Image signing and verification
  - Runtime policies with Falco

Month 5-6: Observability and Audit
  - Kubernetes audit logging
  - Falco runtime detection
  - Network flow logging
  - Security dashboards and alerting
Cloud$5,000/moMigrateBare MetalDocker + LXC$200/mo96% cost reduction

Cloud to self-hosted migration can dramatically reduce infrastructure costs while maintaining full control.

The Bottom Line

Zero Trust for cloud-native applications isn't a product you buy — it's an architecture you build, layer by layer. Start with network policies (immediate value, built into Kubernetes). Add mTLS for encrypted service communication. Enforce policies with OPA. Manage secrets with Vault. Monitor everything with Falco and audit logs.

The goal isn't to prevent all attacks — it's to minimize blast radius. A compromised pod in a Zero Trust cluster can't talk to services it shouldn't, can't read secrets it doesn't own, can't escalate privileges, and triggers alerts the moment it deviates from expected behavior.

That's the difference between a compromised pod being an incident and being a catastrophe.

#zero-trust#cloud-native#kubernetes#security#service-mesh

Related Service

Security & Compliance

Zero-trust architecture, compliance automation, and incident response planning.

Need help with security?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.

We Will Build You a Demo Site — For Free

Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.

47+ companies trusted us
99.99% uptime
< 48hr response

No spam. No contracts. Just a free demo.