NixOS for DevOps: Reproducible Infrastructure Without the Drift
Why NixOS eliminates configuration drift by design, how Nix flakes lock your infrastructure, and when to choose NixOS over traditional Linux distros for server management.
NixOS for DevOps: Reproducible Infrastructure Without the Drift
Every ops team has a horror story about configuration drift. A production server that works differently from staging because someone ran apt-get install three months ago and nobody documented it. A deployment that fails because the build machine has a different version of a shared library than production.
NixOS eliminates this class of problems by design, not by discipline.
Why Configuration Drift Is Still Unsolved
Traditional configuration management tools — Ansible, Chef, Puppet, Salt — are imperative. They describe what commands to run, not what the system should be. Layer enough playbook runs on top of each other, and the actual state of a machine diverges from what your automation thinks it is.
The industry has tried to solve this with immutable infrastructure (rebuild from scratch every time) and containerization (isolate the application from the host). Both help, but neither addresses the fundamental problem: your host operating system is still a mutable snowflake.
NixOS takes a different approach entirely.
How NixOS Actually Works
Every package in the Nix store gets a cryptographic hash derived from ALL its build inputs — source code, compiler version, flags, dependencies. The same inputs always produce byte-identical outputs. This is not aspirational; it is mathematically enforced.
Your entire system configuration lives in a single file (or a set of Nix modules):
# /etc/nixos/configuration.nix
{ config, pkgs, ... }:
{
networking.hostName = "prod-api-01";
networking.firewall.allowedTCPPorts = [ 80 443 22 ];
services.nginx.enable = true;
services.postgresql = {
enable = true;
package = pkgs.postgresql_16;
settings.max_connections = 200;
};
users.users.deploy = {
isNormalUser = true;
extraGroups = [ "docker" "wheel" ];
openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];
};
environment.systemPackages = with pkgs; [
vim git htop docker-compose
];
}
This is not a config template that generates commands. This IS the system. Run nixos-rebuild switch, and NixOS converges the entire machine to this state — packages, services, users, firewall rules, kernel parameters. Everything.
Atomic Upgrades and Instant Rollbacks
Get more insights on DevOps
Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.
When you rebuild, NixOS does not modify packages in place. New packages are written to new paths in /nix/store (which is read-only), and a symlink is atomically flipped to activate the new "generation."
If the upgrade is interrupted mid-way — power failure, network drop, kernel panic — the system boots the previous generation cleanly. There is no half-upgraded state. This is not a feature you enable; it is the only way NixOS works.
# List all system generations
nixos-rebuild list-generations
# Roll back to previous generation (takes seconds)
nixos-rebuild switch --rollback
# Boot into any previous generation from GRUB
# Every generation appears in the bootloader automatically
Compare this to apt-get upgrade on a production server during a late-night maintenance window, praying that the PostgreSQL 15→16 upgrade does not corrupt the data directory.
Nix Flakes: Locking Your Infrastructure
Nix Flakes (still technically "experimental" but the de facto standard since 2023) add a flake.lock file that pins every input — nixpkgs version, custom overlays, external dependencies — to exact Git revisions.
# flake.nix
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.11";
deploy-rs.url = "github:serokell/deploy-rs";
};
outputs = { self, nixpkgs, deploy-rs }: {
nixosConfigurations.prod-api-01 = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [ ./configuration.nix ];
};
};
}
The flake.lock file is committed to Git. When another engineer rebuilds the system, they get the exact same packages you tested. Not "compatible" packages. Identical packages.
This is the same idea as package-lock.json or Cargo.lock, but applied to the entire operating system.
NixOS vs Ansible/Terraform for Fleet Management
| Dimension | Ansible + Ubuntu | NixOS + Colmena |
|---|---|---|
| State model | Imperative (mutates in place) | Declarative (replaces generations) |
| Drift prevention | Requires discipline + regular enforcement runs | Impossible by design |
| Rollback | Manual snapshots or VM restore | Atomic, built-in, per-generation |
| Multi-version packages | Not supported | Native (coexist in store) |
| Fleet deployment | SSH + playbooks | Colmena/NixOps (evaluates locally, pushes closures) |
| Reproducibility | Best-effort | Cryptographic guarantee |
| Learning curve | Moderate (YAML + Jinja2) | Steep (functional programming) |
The comparison is not entirely fair — Terraform manages cloud resources that NixOS does not touch, and Ansible can configure non-NixOS hosts. In practice, many teams use NixOS for the OS layer and Terraform for the cloud infrastructure layer.
Who Uses NixOS in Production
NixOS is not a toy or a hobby project. Production adopters include:
- Shopify — migrating infrastructure to NixOS (2024-2025)
- Channable — uses Nix for packaging and distributing services to production; cites atomic deploys and instant rollbacks as primary benefits
- Determinate Systems — builds enterprise Nix tooling, funded and growing
- Tweag — major Nix consultancy, built much of the modern ecosystem
Fleet management tooling has matured significantly:
- Colmena — deploy NixOS configs to a fleet, tag-based targeting
- NixOps (v4) — Terraform-inspired multi-cloud deployments
- Bento — lightweight deployment for mixed fleets
The Honest Downsides
NixOS is not for every team. The trade-offs are real:
Learning curve is steep. Nix is a lazily-evaluated purely functional programming language. You are writing code, not YAML. Error messages are notoriously cryptic. Debugging a failed build requires understanding the evaluation model.
Binary compatibility breaks. NixOS does not follow the Filesystem Hierarchy Standard — there is no /usr/lib, no standard /bin. Pre-compiled vendor software (monitoring agents, proprietary tools) often fails to find shared libraries. Workarounds exist (buildFHSEnv, nix-ld) but add friction.
Disk usage accumulates. The Nix store keeps all past generations. Routine updates can download 500MB+ for minor changes. You need nix-collect-garbage on a schedule.
Knowledge concentration risk. If the one person who understands Nix leaves, the team is stuck. This is the most cited reason enterprises hesitate.
No commercial support tier. Unlike RHEL or Ubuntu LTS, there is no vendor you can call at 3 AM. Determinate Systems and Tweag offer consulting, but nothing comparable to Red Hat support.
Free Resource
CI/CD Pipeline Blueprint
Our battle-tested pipeline template covering build, test, security scan, staging, and zero-downtime deployment stages.
When to Choose NixOS
Choose NixOS when:
- Configuration drift has caused production incidents
- You need cryptographic proof that staging = production
- Your team is comfortable learning a new paradigm
- You are building immutable infrastructure and want the OS layer to match
- You manage a fleet of similar servers (not one-off snowflakes)
Stick with Ubuntu/Debian + Ansible when:
- Your team needs fast onboarding without a learning cliff
- You rely heavily on vendor software that assumes FHS
- You need commercial support contracts for compliance
- The existing infrastructure works and drift is not your bottleneck
Getting Started: A Minimal Production Server
{ config, pkgs, ... }:
{
system.stateVersion = "24.11";
# Networking
networking = {
hostName = "web-01";
firewall = {
enable = true;
allowedTCPPorts = [ 22 80 443 ];
};
};
# SSH hardening
services.openssh = {
enable = true;
settings = {
PasswordAuthentication = false;
PermitRootLogin = "no";
};
};
# Automatic security updates
system.autoUpgrade = {
enable = true;
allowReboot = true;
dates = "04:00";
};
# Garbage collection
nix.gc = {
automatic = true;
dates = "weekly";
options = "--delete-older-than 14d";
};
# Your application
virtualisation.docker.enable = true;
services.nginx.enable = true;
}
Commit this to Git. Deploy with nixos-rebuild switch. Roll back if anything breaks. Sleep well knowing the server cannot drift.
The Bottom Line
NixOS trades familiarity for guarantees. The learning curve is real and the ecosystem has rough edges. But if you have ever spent a weekend debugging why production behaves differently from staging, NixOS solves that problem at the operating system level — not with more tooling on top, but by making drift architecturally impossible.
The question is not whether NixOS is better. The question is whether the guarantee of reproducibility is worth the cost of learning a new paradigm. For an increasing number of DevOps teams, the answer is yes.
Related Service
Platform Engineering
From CI/CD pipelines to service meshes, we create golden paths for your developers.
Need help with devops?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.
We Will Build You a Demo Site — For Free
Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.
No spam. No contracts. Just a free demo.