Getting started

A short checklist to get from a fresh clone to a working make ping in about five minutes. The full reference is in README.md; troubleshooting is at the bottom of that file.

1. Install prerequisites

make install

That installs the system tools this repo needs (jq, ansible, python3-yaml, ansible-lint, …) via your system package manager, then installs the ansible-galaxy collections we use. It's idempotent — already-present tools are skipped silently. Debian/Ubuntu is fully automated; for macOS, the script prints the equivalent brew commands for you to run.

kubectl is the one exception: its apt repo needs signing-key setup, so the script tells you to install it from kubernetes.io/docs/tasks/tools/.

2. Get cluster access

  • A kubeconfig with get,list nodes on every cluster you want to inventory. Ask whoever currently provisions kubeconfigs.
  • An SSH key authorised on the cluster nodes. Currently nodes accept logins from your personal Hetzner-side user (e.g. jan); once playbooks/bootstrap-ansible-user.yml has been run, also as ansible.

3. Configure your local copy

git clone <repo>
cd ansible
make configure

make configure is interactive — it lists the kubectl contexts it found, the SSH keys under ~/.ssh/, and writes config.yml (gitignored, mode 0600). Ansible and the helper scripts read it automatically; no source step needed.

4. Verify

make doctor

Should print mostly [OK]. Investigate every [FAIL] before continuing — the hints under each line tell you what to fix. [WARN]s are non-blocking.

5. First contact

make graph                  # show the cluster + groups (queries kubectl)
make ping                   # SSH-to-every-node round-trip

Inventory is dynamic — inventory/k8s-nodes.sh calls kubectl get nodes on every ansible run, so nothing to refresh manually.

If make ping returns SUCCESS for every host, you're done. Try:

make system-info            # OS, kubelet version, disk usage — read-only

6. (Optional) Switch to the dedicated ansible user

So you don't need -K for every privileged playbook, and so Semaphore can run playbooks unattended:

# one-time: generate the shared key, commit the .pub, store the private half
# in your team's secret manager
ssh-keygen -t ed25519 -f files/ansible-team -C 'ansible@gem-cluster'

# one-time per cluster: run as a human with sudo password, creates the user
make bootstrap-user         # internally: ansible-playbook ... -K

After that, re-run make configure and pick ansible as the SSH user when prompted, and point the private-key path at the team-shared key you generated in step 1 (files/ansible-team). The new values land in config.yml and are picked up automatically.

What does what

You want to … Run
See every target make help
Re-validate the local setup make doctor
See the current node list make graph
Just check connectivity make ping
Install baseline packages + motd make node-baseline
Update apt packages (no reboot) make apt-update
Update apt + reboot when needed ansible-playbook playbooks/apt-update.yml -e auto_reboot=true
Update apt with dist-upgrade ansible-playbook playbooks/apt-update.yml -e dist_upgrade=true
Target a single host --limit gem-c01-cp01
Target only k8s nodes (no bastion) --limit gem_cluster_01 (bastion lives in its own top-level group now)
Dry-run a change --check --diff

Where to put new automation

  • A new playbookplaybooks/<name>.yml. Add a corresponding make target if it'll be run often.
  • A new role (reusable steps, defaults, templates) — roles/<role>/{tasks,defaults,templates}/. Use roles/node_baseline/ as a starting template.
  • A new cluster — append its kubectl context to kube_contexts in config.yml and create inventory/group_vars/<context_slug>.yml (dashes → underscores). Add its bastion to inventory/00-static.yml under bastion and reference it via bastion_inventory_name in the new group_vars file.