NixOS for DevOps: Reproducible Infrastructure Without the Drift
Why NixOS eliminates configuration drift by design, how Nix flakes lock your infrastructure, and when to choose NixOS over traditional Linux distros for server management.
# NixOS for DevOps: Reproducible Infrastructure Without the Drift
Every ops team has a horror story about configuration drift. A production server that works differently from staging because someone ran apt-get install three months ago and nobody documented it. A deployment that fails because the build machine has a different version of a shared library than production.
NixOS eliminates this class of problems by design, not by discipline.
Why Configuration Drift Is Still Unsolved
Traditional configuration management tools — Ansible, Chef, Puppet, Salt — are imperative. They describe *what commands to run*, not *what the system should be*. Layer enough playbook runs on top of each other, and the actual state of a machine diverges from what your automation thinks it is.
The industry has tried to solve this with immutable infrastructure (rebuild from scratch every time) and containerization (isolate the application from the host). Both help, but neither addresses the fundamental problem: your host operating system is still a mutable snowflake.
NixOS takes a different approach entirely.
How NixOS Actually Works
Every package in the Nix store gets a cryptographic hash derived from ALL its build inputs — source code, compiler version, flags, dependencies. The same inputs always produce byte-identical outputs. This is not aspirational; it is mathematically enforced.
Your entire system configuration lives in a single file (or a set of Nix modules):
# /etc/nixos/configuration.nix
{ config, pkgs, ... }:
{
networking.hostName = "prod-api-01";
networking.firewall.allowedTCPPorts = [ 80 443 22 ];
services.nginx.enable = true;
services.postgresql = {
enable = true;
package = pkgs.postgresql_16;
settings.max_connections = 200;
};
users.users.deploy = {
isNormalUser = true;
extraGroups = [ "docker" "wheel" ];
openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];
};
environment.systemPackages = with pkgs; [
vim git htop docker-compose
];
}This is not a config template that generates commands. This IS the system. Run nixos-rebuild switch, and NixOS converges the entire machine to this state — packages, services, users, firewall rules, kernel parameters. Everything.
Atomic Upgrades and Instant Rollbacks
When you rebuild, NixOS does not modify packages in place. New packages are written to new paths in /nix/store (which is read-only), and a symlink is atomically flipped to activate the new "generation."
If the upgrade is interrupted mid-way — power failure, network drop, kernel panic — the system boots the previous generation cleanly. There is no half-upgraded state. This is not a feature you enable; it is the only way NixOS works.
# List all system generations
nixos-rebuild list-generations
# Roll back to previous generation (takes seconds)
nixos-rebuild switch --rollback
# Boot into any previous generation from GRUB
# Every generation appears in the bootloader automaticallyCompare this to apt-get upgrade on a production server during a late-night maintenance window, praying that the PostgreSQL 15→16 upgrade does not corrupt the data directory.
Nix Flakes: Locking Your Infrastructure
Nix Flakes (still technically "experimental" but the de facto standard since 2023) add a flake.lock file that pins every input — nixpkgs version, custom overlays, external dependencies — to exact Git revisions.
# flake.nix
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.11";
deploy-rs.url = "github:serokell/deploy-rs";
};
outputs = { self, nixpkgs, deploy-rs }: {
nixosConfigurations.prod-api-01 = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [ ./configuration.nix ];
};
};
}The flake.lock file is committed to Git. When another engineer rebuilds the system, they get the exact same packages you tested. Not "compatible" packages. Identical packages.
This is the same idea as package-lock.json or Cargo.lock, but applied to the entire operating system.
NixOS vs Ansible/Terraform for Fleet Management
|-----------|-----------------|-----------------|
The comparison is not entirely fair — Terraform manages cloud resources that NixOS does not touch, and Ansible can configure non-NixOS hosts. In practice, many teams use NixOS for the OS layer and Terraform for the cloud infrastructure layer.
Who Uses NixOS in Production
NixOS is not a toy or a hobby project. Production adopters include:
Fleet management tooling has matured significantly:
The Honest Downsides
NixOS is not for every team. The trade-offs are real:
Learning curve is steep. Nix is a lazily-evaluated purely functional programming language. You are writing code, not YAML. Error messages are notoriously cryptic. Debugging a failed build requires understanding the evaluation model.
Binary compatibility breaks. NixOS does not follow the Filesystem Hierarchy Standard — there is no /usr/lib, no standard /bin. Pre-compiled vendor software (monitoring agents, proprietary tools) often fails to find shared libraries. Workarounds exist (buildFHSEnv, nix-ld) but add friction.
Disk usage accumulates. The Nix store keeps all past generations. Routine updates can download 500MB+ for minor changes. You need nix-collect-garbage on a schedule.
Knowledge concentration risk. If the one person who understands Nix leaves, the team is stuck. This is the most cited reason enterprises hesitate.
No commercial support tier. Unlike RHEL or Ubuntu LTS, there is no vendor you can call at 3 AM. Determinate Systems and Tweag offer consulting, but nothing comparable to Red Hat support.
When to Choose NixOS
Choose NixOS when:
Stick with Ubuntu/Debian + Ansible when:
Getting Started: A Minimal Production Server
{ config, pkgs, ... }:
{
system.stateVersion = "24.11";
# Networking
networking = {
hostName = "web-01";
firewall = {
enable = true;
allowedTCPPorts = [ 22 80 443 ];
};
};
# SSH hardening
services.openssh = {
enable = true;
settings = {
PasswordAuthentication = false;
PermitRootLogin = "no";
};
};
# Automatic security updates
system.autoUpgrade = {
enable = true;
allowReboot = true;
dates = "04:00";
};
# Garbage collection
nix.gc = {
automatic = true;
dates = "weekly";
options = "--delete-older-than 14d";
};
# Your application
virtualisation.docker.enable = true;
services.nginx.enable = true;
}Commit this to Git. Deploy with nixos-rebuild switch. Roll back if anything breaks. Sleep well knowing the server cannot drift.
The Bottom Line
NixOS trades familiarity for guarantees. The learning curve is real and the ecosystem has rough edges. But if you have ever spent a weekend debugging why production behaves differently from staging, NixOS solves that problem at the operating system level — not with more tooling on top, but by making drift architecturally impossible.
The question is not whether NixOS is better. The question is whether the guarantee of reproducibility is worth the cost of learning a new paradigm. For an increasing number of DevOps teams, the answer is yes.
Need help with devops?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.