Skip to main content

Ansible

Purpose

Ansible is Lumie's mutable bootstrap layer between raw OCI instances and a GitOps-managed cluster. This page is a reference document for the playbooks, roles, inventory generation, and bootstrap side effects that happen before Argo CD takes over.

Source Paths

PathRole
lumie-infra/provision/ansible/ansible.cfgRuntime defaults, SSH behavior, role path, and parallelism
lumie-infra/provision/ansible/inventory/terraform_inventory.pyDynamic inventory generator from terraform output -json
lumie-infra/provision/ansible/group_vars/*.ymlK3s version, master URL, node grouping, and shared defaults
lumie-infra/provision/ansible/playbooks/site.ymlFull end-to-end cluster bootstrap
lumie-infra/provision/ansible/roles/common/tasks/main.ymlOS prep, iptables reset, kernel modules, and sysctl
lumie-infra/provision/ansible/roles/storage-setup/tasks/main.ymlFormatting and mounting MinIO block devices
lumie-infra/provision/ansible/roles/k3s-master/tasks/main.ymlK3s server install and token export
lumie-infra/provision/ansible/roles/k3s-worker/tasks/main.ymlK3s agent join flow
lumie-infra/provision/ansible/roles/argocd-bootstrap/tasks/main.ymlHelm install, bootstrap secrets, Git clone, and root app apply
lumie-infra/provision/ansible/playbooks/fetch-kubeconfig.ymlOperator kubeconfig export

Entrypoints

The main operator surface is the playbook set:

PlaybookPurpose
playbooks/site.ymlFull bootstrap: prep, master, storage, workers, verify, Argo CD
playbooks/k3s-master.ymlMaster-only install
playbooks/k3s-workers.ymlWorker-only rollout after a master already exists
playbooks/fetch-kubeconfig.ymlSave public and private kubeconfig files locally
playbooks/k3s-reset.ymlDestructive removal of K3s from all nodes

site.yml is the authoritative sequence:

- name: Prepare all nodes
- name: Install K3s Master
- name: Setup storage partitions on workers
- name: Install K3s Workers (Account 0214)
- name: Install K3s Workers (Account 0213)
- name: Verify cluster
- name: Bootstrap ArgoCD and GitOps

Runtime Flow

Inventory Contract

inventory/terraform_inventory.py is the bridge between Terraform and Ansible:

  • it shells out to terraform output -json;
  • it builds masters, workers_0214, and workers_0213 groups;
  • it injects ansible_host, private_ip, and k3s_master_url;
  • it derives the worker join target from the actual master private IP instead of a copied static file.

That means Ansible bootstrap implicitly depends on a successful and up-to-date Terraform apply.

Role Behavior

common

The common role intentionally normalizes base OS state:

  • waits for cloud-init completion;
  • installs packages including iptables, netfilter-persistent, and open-iscsi;
  • sets all default iptables chain policies to ACCEPT;
  • flushes existing iptables rules;
  • loads br_netfilter and overlay;
  • applies K3s-related sysctls.

This is more invasive than a typical application bootstrap. It assumes these nodes are dedicated cluster hosts.

storage-setup

The storage role formats /dev/sdb as ext4, labels it per host as minio-<hostname>, and mounts it at /mnt/minio-data. The label-based mount is the idempotency guard: reruns do not reformat a correctly labeled and mounted disk.

k3s-master

The master role:

  • downloads get.k3s.io only when needed;
  • templates /etc/rancher/k3s/registries.yaml;
  • installs K3s server;
  • waits for the local API and node token;
  • reads the token into an Ansible fact for worker joins.

k3s-worker

The worker role:

  • delegates token reads to the master;
  • waits for TCP reachability to 10.0.0.241:6443;
  • installs k3s-agent;
  • waits until kubectl get nodes on the master shows the worker.

The playbooks apply workers serially within each tenancy group to reduce race conditions and make failures easier to pinpoint.

argocd-bootstrap

The bootstrap role performs the last imperative steps before GitOps:

  1. Installs Helm if needed.
  2. Creates argocd, minio, and vault namespaces.
  3. Creates minio-root-password and vault-config-secret.
  4. Installs Argo CD from Helm with a minimal bootstrap values file.
  5. Clones lumie-infra into /tmp.
  6. Applies the configured root app manifests.

The bootstrap values intentionally enable anonymous Argo CD access for the first install. Git-managed Argo CD configuration is expected to replace that bootstrap posture afterward.

Kubeconfig Export

fetch-kubeconfig.yml reads /etc/rancher/k3s/k3s.yaml from the master and writes:

  • k3s-public.yaml pointing at the master's public IP;
  • k3s-private.yaml pointing at the master's private IP;
  • config as a symlink to the public variant.

This makes local operator access explicit instead of reusing the server-local 127.0.0.1 kubeconfig.

Contract Drift

Two inspected mismatches are important:

  • roles/argocd-bootstrap/defaults/main.yml still includes web-apps/application.yaml in app_of_apps_paths, but lumie-infra/web-apps/ does not exist in the current repo. A fresh rerun of the bootstrap role would need that path fixed first.
  • provision/ansible/README.md still states that Traefik is disabled, but the actual server role does not disable it and the live cluster inspected on June 14, 2026 has the bundled Traefik addon running.

Failure Modes

Failure pointImpact
Terraform outputs stale or missingDynamic inventory generation fails before any host work starts
k3s_master_private_ip incorrectWorker join waits time out even though SSH works
/dev/sdb absent on a workerStorage role hard-fails before the worker rollout continues
Vault secret bootstrap omittedVault cannot start with its S3 backend config, blocking downstream VaultStaticSecret consumers
Missing root app pathArgo CD bootstrap fails during initial app apply

Verification

Inventory and connectivity:

cd lumie-infra/provision/ansible
./inventory/terraform_inventory.py --list | jq .
ansible all -i inventory/terraform_inventory.py -m ping

Dry-run the bootstrap logic:

ansible-playbook -i inventory/terraform_inventory.py playbooks/site.yml --check

Live cluster confirmation after a real run:

kubectl get nodes -o wide
kubectl get applications -n argocd

Success signals:

  • ./inventory/terraform_inventory.py --list includes the masters, workers_0214, and workers_0213 groups plus k3s_master_url.
  • ansible all ... -m ping returns SUCCESS for every host in the generated inventory.
  • ansible-playbook ... playbooks/site.yml --check reaches the PLAY RECAP without failed hosts for the current bootstrap contract.
  • After a real run, kubectl get nodes -o wide shows joined workers and kubectl get applications -n argocd shows the root GitOps apps created by argocd-bootstrap.