← All articlesDevOps

NixOS for DevOps: Reproducible Infrastructure Without the Drift

Why NixOS eliminates configuration drift by design, how Nix flakes lock your infrastructure, and when to choose NixOS over traditional Linux distros for server management.

Yash Pritwani

23 March 202611 min read read

Ask Yash to map next step

One owner, one affected system, and the next buyer or recovery deadline mapped.

# NixOS for DevOps: Reproducible Infrastructure Without the Drift

Every ops team has a horror story about configuration drift. A production server that works differently from staging because someone ran apt-get install three months ago and nobody documented it. A deployment that fails because the build machine has a different version of a shared library than production.

NixOS eliminates this class of problems by design, not by discipline.

Why Configuration Drift Is Still Unsolved

Traditional configuration management tools — Ansible, Chef, Puppet, Salt — are imperative. They describe *what commands to run*, not *what the system should be*. Layer enough playbook runs on top of each other, and the actual state of a machine diverges from what your automation thinks it is.

The industry has tried to solve this with immutable infrastructure (rebuild from scratch every time) and containerization (isolate the application from the host). Both help, but neither addresses the fundamental problem: your host operating system is still a mutable snowflake.

NixOS takes a different approach entirely.

How NixOS Actually Works

Every package in the Nix store gets a cryptographic hash derived from ALL its build inputs — source code, compiler version, flags, dependencies. The same inputs always produce byte-identical outputs. This is not aspirational; it is mathematically enforced.

Your entire system configuration lives in a single file (or a set of Nix modules):

# /etc/nixos/configuration.nix
{ config, pkgs, ... }:
{
  networking.hostName = "prod-api-01";
  networking.firewall.allowedTCPPorts = [ 80 443 22 ];

  services.nginx.enable = true;
  services.postgresql = {
    enable = true;
    package = pkgs.postgresql_16;
    settings.max_connections = 200;
  };

  users.users.deploy = {
    isNormalUser = true;
    extraGroups = [ "docker" "wheel" ];
    openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAA..." ];
  };

  environment.systemPackages = with pkgs; [
    vim git htop docker-compose
  ];
}

This is not a config template that generates commands. This IS the system. Run nixos-rebuild switch, and NixOS converges the entire machine to this state — packages, services, users, firewall rules, kernel parameters. Everything.

Atomic Upgrades and Instant Rollbacks

When you rebuild, NixOS does not modify packages in place. New packages are written to new paths in /nix/store (which is read-only), and a symlink is atomically flipped to activate the new "generation."

If the upgrade is interrupted mid-way — power failure, network drop, kernel panic — the system boots the previous generation cleanly. There is no half-upgraded state. This is not a feature you enable; it is the only way NixOS works.

# List all system generations
nixos-rebuild list-generations

# Roll back to previous generation (takes seconds)
nixos-rebuild switch --rollback

# Boot into any previous generation from GRUB
# Every generation appears in the bootloader automatically

Compare this to apt-get upgrade on a production server during a late-night maintenance window, praying that the PostgreSQL 15→16 upgrade does not corrupt the data directory.

Nix Flakes: Locking Your Infrastructure

Nix Flakes (still technically "experimental" but the de facto standard since 2023) add a flake.lock file that pins every input — nixpkgs version, custom overlays, external dependencies — to exact Git revisions.

# flake.nix
{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.11";
    deploy-rs.url = "github:serokell/deploy-rs";
  };

  outputs = { self, nixpkgs, deploy-rs }: {
    nixosConfigurations.prod-api-01 = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [ ./configuration.nix ];
    };
  };
}

The flake.lock file is committed to Git. When another engineer rebuilds the system, they get the exact same packages you tested. Not "compatible" packages. Identical packages.

This is the same idea as package-lock.json or Cargo.lock, but applied to the entire operating system.

NixOS vs Ansible/Terraform for Fleet Management

Dimension

Ansible + Ubuntu

NixOS + Colmena

|-----------|-----------------|-----------------|

State model

Imperative (mutates in place)

Declarative (replaces generations)

Drift prevention

Requires discipline + regular enforcement runs

Impossible by design

Rollback

Manual snapshots or VM restore

Atomic, built-in, per-generation

Multi-version packages

Not supported

Native (coexist in store)

Fleet deployment

SSH + playbooks

Colmena/NixOps (evaluates locally, pushes closures)

Reproducibility

Best-effort

Cryptographic guarantee

Learning curve

Moderate (YAML + Jinja2)

Steep (functional programming)

The comparison is not entirely fair — Terraform manages cloud resources that NixOS does not touch, and Ansible can configure non-NixOS hosts. In practice, many teams use NixOS for the OS layer and Terraform for the cloud infrastructure layer.

Who Uses NixOS in Production

NixOS is not a toy or a hobby project. Production adopters include:

•Shopify — migrating infrastructure to NixOS (2024-2025)

•Channable — uses Nix for packaging and distributing services to production; cites atomic deploys and instant rollbacks as primary benefits

•Determinate Systems — builds enterprise Nix tooling, funded and growing

•Tweag — major Nix consultancy, built much of the modern ecosystem

Fleet management tooling has matured significantly:

•Colmena — deploy NixOS configs to a fleet, tag-based targeting

•NixOps (v4) — Terraform-inspired multi-cloud deployments

•Bento — lightweight deployment for mixed fleets

The Honest Downsides

NixOS is not for every team. The trade-offs are real:

Learning curve is steep. Nix is a lazily-evaluated purely functional programming language. You are writing code, not YAML. Error messages are notoriously cryptic. Debugging a failed build requires understanding the evaluation model.

Binary compatibility breaks. NixOS does not follow the Filesystem Hierarchy Standard — there is no /usr/lib, no standard /bin. Pre-compiled vendor software (monitoring agents, proprietary tools) often fails to find shared libraries. Workarounds exist (buildFHSEnv, nix-ld) but add friction.

Disk usage accumulates. The Nix store keeps all past generations. Routine updates can download 500MB+ for minor changes. You need nix-collect-garbage on a schedule.

Knowledge concentration risk. If the one person who understands Nix leaves, the team is stuck. This is the most cited reason enterprises hesitate.

No commercial support tier. Unlike RHEL or Ubuntu LTS, there is no vendor you can call at 3 AM. Determinate Systems and Tweag offer consulting, but nothing comparable to Red Hat support.

When to Choose NixOS

Choose NixOS when:

•Configuration drift has caused production incidents

•You need cryptographic proof that staging = production

•Your team is comfortable learning a new paradigm

•You are building immutable infrastructure and want the OS layer to match

•You manage a fleet of similar servers (not one-off snowflakes)

Stick with Ubuntu/Debian + Ansible when:

•Your team needs fast onboarding without a learning cliff

•You rely heavily on vendor software that assumes FHS

•You need commercial support contracts for compliance

•The existing infrastructure works and drift is not your bottleneck

Getting Started: A Minimal Production Server

{ config, pkgs, ... }:
{
  system.stateVersion = "24.11";

  # Networking
  networking = {
    hostName = "web-01";
    firewall = {
      enable = true;
      allowedTCPPorts = [ 22 80 443 ];
    };
  };

  # SSH hardening
  services.openssh = {
    enable = true;
    settings = {
      PasswordAuthentication = false;
      PermitRootLogin = "no";
    };
  };

  # Automatic security updates
  system.autoUpgrade = {
    enable = true;
    allowReboot = true;
    dates = "04:00";
  };

  # Garbage collection
  nix.gc = {
    automatic = true;
    dates = "weekly";
    options = "--delete-older-than 14d";
  };

  # Your application
  virtualisation.docker.enable = true;
  services.nginx.enable = true;
}

Commit this to Git. Deploy with nixos-rebuild switch. Roll back if anything breaks. Sleep well knowing the server cannot drift.

The Bottom Line

NixOS trades familiarity for guarantees. The learning curve is real and the ecosystem has rough edges. But if you have ever spent a weekend debugging why production behaves differently from staging, NixOS solves that problem at the operating system level — not with more tooling on top, but by making drift architecturally impossible.

The question is not whether NixOS is better. The question is whether the guarantee of reproducibility is worth the cost of learning a new paradigm. For an increasing number of DevOps teams, the answer is yes.

#nixos#nix#infrastructure-as-code#reproducible-builds#devops#configuration-management#immutable-infrastructure

Need the next owner and evidence step mapped?

Send the current system and deadline. Yash replies with the service path, first proof artifact, and handoff owner.

Ask Yash to map next step Call +91 84569 84870