Zero-Trust Security Architecture for Cloud-Native Apps
A hands-on guide to implementing Zero Trust in Kubernetes-based cloud-native applications. Covers service mesh mTLS, SPIFFE identity, OPA policy...
The Perimeter Doesn't Exist Anymore
In a traditional data center, security meant a firewall at the edge. Traffic inside the network was trusted. This worked when all your services ran on known machines in a known building.
Defense in depth: multiple security layers protect your infrastructure from threats.
In a cloud-native world running on Kubernetes, this model collapses:
- Pods are ephemeral: A pod's IP address changes every time it restarts. Firewall rules based on IP are meaningless.
- Clusters are multi-tenant: Multiple teams, services, and environments share infrastructure.
- The network is hostile: East-west traffic between services can be intercepted. A compromised pod can attack its neighbors.
- Supply chain attacks: A single compromised container image can provide an attacker with a foothold inside your cluster.
Zero Trust architecture assumes that every network request — even internal ones — might be malicious. Every request must be authenticated, authorized, and encrypted, regardless of where it originates.
Here's how to implement it in cloud-native applications.
The Five Pillars of Cloud-Native Zero Trust
┌─────────────────────────────────────────────────────┐
│ Cloud-Native Zero Trust │
├────────────┬────────────┬────────────┬──────────────┤
│ Identity │ Network │ Policy │ Secrets │
│ │ │ │ │
│ SPIFFE/ │ mTLS │ OPA/Gat │ Vault/ESO │
│ SPIRE │ Network │ ekeeper │ │
│ │ Policies │ │ │
├────────────┴────────────┴────────────┴──────────────┤
│ Observability & Audit │
│ Falco │ Audit Logs │ Network Flow Logs │
└─────────────────────────────────────────────────────┘
Pillar 1: Workload Identity with SPIFFE/SPIRE
SPIFFE (Secure Production Identity Framework For Everyone) provides cryptographic identity to every workload without relying on network location.
How SPIFFE Works
Every workload gets a SPIFFE ID:
spiffe://techsaas.cloud/ns/production/sa/payment-service
│ │ │ │
protocol trust domain namespace service account
SPIRE (the SPIFFE Runtime Environment) issues short-lived X.509 certificates to workloads:
# SPIRE Server deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: spire-server
namespace: spire
spec:
replicas: 1
template:
spec:
containers:
- name: spire-server
image: ghcr.io/spiffe/spire-server:1.9.0
args:
- -config
- /run/spire/config/server.conf
volumeMounts:
- name: spire-config
mountPath: /run/spire/config
- name: spire-data
mountPath: /run/spire/data
# server.conf — SPIRE Server configuration
server {
bind_address = "0.0.0.0"
bind_port = "8081"
trust_domain = "techsaas.cloud"
data_dir = "/run/spire/data"
log_level = "INFO"
ca_ttl = "24h"
default_x509_svid_ttl = "1h" # Short-lived certificates
}
plugins {
DataStore "sql" {
plugin_data {
database_type = "postgres"
connection_string = "dbname=spire host=postgres user=spire"
}
}
NodeAttestor "k8s_psat" {
plugin_data {
clusters = {
"production" = {
service_account_allow_list = ["spire:spire-agent"]
}
}
}
}
KeyManager "disk" {
plugin_data {
keys_path = "/run/spire/data/keys.json"
}
}
}
Get more insights on Security
Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.
Certificates are automatically rotated every hour. If a workload is compromised, the blast radius is limited to the certificate's remaining TTL.
Registering Workloads
# Register the payment service identity
spire-server entry create \
-spiffeID spiffe://techsaas.cloud/ns/production/sa/payment-service \
-parentID spiffe://techsaas.cloud/ns/spire/sa/spire-agent \
-selector k8s:ns:production \
-selector k8s:sa:payment-service
# Register the order service identity
spire-server entry create \
-spiffeID spiffe://techsaas.cloud/ns/production/sa/order-service \
-parentID spiffe://techsaas.cloud/ns/spire/sa/spire-agent \
-selector k8s:ns:production \
-selector k8s:sa:order-service
Selectors bind SPIFFE IDs to Kubernetes attributes (namespace, service account, pod labels). A workload gets its identity only if it matches the registered selectors.
Pillar 2: mTLS Everywhere with Service Mesh
mutual TLS ensures every service-to-service call is encrypted and authenticated in both directions.
Istio Service Mesh Configuration
# Enforce mTLS cluster-wide
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT # Reject any non-mTLS traffic
---
# Authorization policy: payment-service can only be called by order-service
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: payment-service-policy
namespace: production
spec:
selector:
matchLabels:
app: payment-service
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/production/sa/order-service"
- "cluster.local/ns/production/sa/checkout-service"
to:
- operation:
methods: ["POST"]
paths: ["/api/v1/payments/*"]
- from:
- source:
principals:
- "cluster.local/ns/monitoring/sa/prometheus"
to:
- operation:
methods: ["GET"]
paths: ["/metrics"]
This configuration:
- Requires mTLS for all communication cluster-wide
- Allows only
order-serviceandcheckout-serviceto call the payment API - Allows only Prometheus to scrape metrics
- Denies everything else by default
Cilium (eBPF-Based Alternative)
For teams wanting service mesh security without sidecar proxies:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: payment-service-l7
namespace: production
spec:
endpointSelector:
matchLabels:
app: payment-service
ingress:
- fromEndpoints:
- matchLabels:
app: order-service
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: POST
path: "/api/v1/payments/.*"
- fromEndpoints:
- matchLabels:
app: prometheus
toPorts:
- ports:
- port: "9090"
protocol: TCP
rules:
http:
- method: GET
path: "/metrics"
Cilium enforces L7 (HTTP) policies at the kernel level using eBPF, without sidecar containers. Lower overhead, same security guarantees.
Pillar 3: Policy Enforcement with OPA/Gatekeeper
Open Policy Agent (OPA) provides fine-grained policy enforcement across the entire stack.
Kubernetes Admission Policies
# Gatekeeper constraint template: require non-root containers
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8srequirednonroot
spec:
crd:
spec:
names:
kind: K8sRequiredNonRoot
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequirednonroot
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.securityContext.runAsNonRoot
msg := sprintf("Container %v must set securityContext.runAsNonRoot=true", [container.name])
}
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
container.securityContext.runAsUser == 0
msg := sprintf("Container %v must not run as root (UID 0)", [container.name])
}
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredNonRoot
metadata:
name: require-non-root
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
namespaces:
- production
- staging
Zero Trust architecture: every request is verified through identity, policy, and access proxy layers.
API-Level Authorization
OPA can also enforce fine-grained API authorization:
# policy.rego — API authorization
package authz
import rego.v1
default allow := false
# Allow authenticated users to read their own data
allow if {
input.method == "GET"
input.path = ["api", "v1", "users", user_id]
input.user.id == user_id
}
# Allow admins to read any user data
allow if {
input.method == "GET"
startswith(input.path[0], "api")
"admin" in input.user.roles
}
# Allow payment-service to create charges
allow if {
input.method == "POST"
input.path == ["api", "v1", "payments"]
input.caller.spiffe_id == "spiffe://techsaas.cloud/ns/production/sa/order-service"
}
# Deny requests outside business hours for non-critical operations
allow if {
input.method in ["DELETE", "PUT"]
time.clock(time.now_ns())[0] >= 8 # After 8 AM
time.clock(time.now_ns())[0] < 22 # Before 10 PM
}
Integrate OPA with your API gateway or as a sidecar:
import requests
from functools import wraps
from flask import request, abort
OPA_URL = "http://localhost:8181/v1/data/authz/allow"
def require_authz(f):
@wraps(f)
def decorated(*args, **kwargs):
opa_input = {
"input": {
"method": request.method,
"path": request.path.strip("/").split("/"),
"user": get_user_from_jwt(request),
"caller": get_spiffe_id(request),
}
}
result = requests.post(OPA_URL, json=opa_input).json()
if not result.get("result", False):
abort(403, "Access denied by policy")
return f(*args, **kwargs)
return decorated
Pillar 4: Secrets Management
HashiCorp Vault with Kubernetes
# External Secrets Operator — sync Vault secrets to Kubernetes
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-backend
namespace: production
spec:
provider:
vault:
server: "http://vault.vault:8200"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "production-apps"
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: payment-service-secrets
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: payment-service-secrets
creationPolicy: Owner
data:
- secretKey: DB_PASSWORD
remoteRef:
key: production/payment-service
property: db_password
- secretKey: STRIPE_API_KEY
remoteRef:
key: production/payment-service
property: stripe_api_key
Secrets are never stored in Kubernetes manifests or environment variable definitions. They're synced from Vault at runtime and automatically rotated.
Pod Security Standards
# Enforce restricted pod security
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
---
# Compliant pod spec
apiVersion: v1
kind: Pod
metadata:
name: payment-service
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: registry.techsaas.cloud/payment-service:v2.1.0@sha256:abc123...
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
limits:
cpu: "500m"
memory: "256Mi"
requests:
cpu: "100m"
memory: "128Mi"
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir:
sizeLimit: 100Mi
Security features:
readOnlyRootFilesystem: Prevents writing to the container filesystem (limits attacker capability)capabilities.drop: ["ALL"]: Removes all Linux capabilitiesallowPrivilegeEscalation: false: Prevents gaining additional privilegesseccompProfile: RuntimeDefault: Restricts system calls- Image pinned by digest (
@sha256:...): Prevents supply chain substitution - Resource limits: Prevents resource exhaustion attacks
Pillar 5: Observability and Audit
Free Resource
Infrastructure Security Audit Template
The exact audit template we use with clients: 60+ checks across network, identity, secrets management, and compliance.
Runtime Threat Detection with Falco
# Falco rules for Zero Trust violation detection
- rule: Unexpected Network Connection
desc: Detect connections to unexpected external endpoints
condition: >
evt.type in (connect, sendto) and
fd.net != "127.0.0.0/8" and
fd.net != "10.0.0.0/8" and
fd.net != "172.16.0.0/12" and
k8s.ns.name = "production"
output: >
Unexpected outbound connection
(pod=%k8s.pod.name namespace=%k8s.ns.name
dest=%fd.name user=%user.name command=%proc.cmdline)
priority: WARNING
- rule: Shell in Container
desc: Detect interactive shell in production containers
condition: >
spawned_process and
container and
proc.name in (bash, sh, zsh, dash) and
proc.tty != 0 and
k8s.ns.name = "production"
output: >
Shell spawned in production container
(pod=%k8s.pod.name command=%proc.cmdline user=%user.name)
priority: CRITICAL
- rule: Sensitive File Access
desc: Detect access to sensitive paths
condition: >
open_read and
container and
fd.name startswith /etc/shadow or
fd.name startswith /etc/kubernetes or
fd.name startswith /var/run/secrets
output: >
Sensitive file accessed
(pod=%k8s.pod.name file=%fd.name command=%proc.cmdline)
priority: CRITICAL
Kubernetes Audit Logging
# Audit policy — log security-relevant events
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all changes to secrets
- level: RequestResponse
resources:
- group: ""
resources: ["secrets"]
verbs: ["create", "update", "patch", "delete"]
# Log all RBAC changes
- level: RequestResponse
resources:
- group: "rbac.authorization.k8s.io"
resources: ["clusterroles", "clusterrolebindings", "roles", "rolebindings"]
# Log exec into pods
- level: RequestResponse
resources:
- group: ""
resources: ["pods/exec", "pods/attach"]
# Log service account token creation
- level: Metadata
resources:
- group: ""
resources: ["serviceaccounts/token"]
Network Policies: The First Step
If you implement nothing else, implement network policies. They're built into Kubernetes and provide immediate value:
# Default deny all ingress and egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
# Allow specific communication patterns
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payment-service-network
namespace: production
spec:
podSelector:
matchLabels:
app: payment-service
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: order-service
ports:
- port: 8080
protocol: TCP
egress:
# Allow database access
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- port: 5432
protocol: TCP
# Allow DNS
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
Start with default-deny, then explicitly allow required communication paths. This immediately prevents lateral movement from compromised pods.
Implementation Priority
Don't try to implement everything at once. Follow this order:
Week 1-2: Network Policies
- Default deny in production namespace
- Allow known communication patterns
- Immediate lateral movement prevention
Week 3-4: Pod Security Standards
- Enforce restricted policy on production namespace
- Non-root containers, read-only filesystems
- Drop all capabilities
Month 2: mTLS with Service Mesh or Cilium
- Encrypt all east-west traffic
- Service identity authentication
- L7 authorization policies
Month 3: Secrets Management
- External Secrets Operator + Vault
- Rotate all static secrets
- Remove secrets from environment variables
Month 4: Policy Enforcement
- OPA/Gatekeeper admission policies
- Image signing and verification
- Runtime policies with Falco
Month 5-6: Observability and Audit
- Kubernetes audit logging
- Falco runtime detection
- Network flow logging
- Security dashboards and alerting
Cloud to self-hosted migration can dramatically reduce infrastructure costs while maintaining full control.
The Bottom Line
Zero Trust for cloud-native applications isn't a product you buy — it's an architecture you build, layer by layer. Start with network policies (immediate value, built into Kubernetes). Add mTLS for encrypted service communication. Enforce policies with OPA. Manage secrets with Vault. Monitor everything with Falco and audit logs.
The goal isn't to prevent all attacks — it's to minimize blast radius. A compromised pod in a Zero Trust cluster can't talk to services it shouldn't, can't read secrets it doesn't own, can't escalate privileges, and triggers alerts the moment it deviates from expected behavior.
That's the difference between a compromised pod being an incident and being a catastrophe.
Related Service
Security & Compliance
Zero-trust architecture, compliance automation, and incident response planning.
Need help with security?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.
We Will Build You a Demo Site — For Free
Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.
No spam. No contracts. Just a free demo.