Getting started¶

A short checklist to get from a fresh clone to a working make ping in about five minutes. The full reference is in README.md; troubleshooting is at the bottom of that file.

1. Install prerequisites¶

make install

That installs the system tools this repo needs (jq, ansible, python3-yaml, ansible-lint, …) via your system package manager, then installs the ansible-galaxy collections we use. It's idempotent — already-present tools are skipped silently. Debian/Ubuntu is fully automated; for macOS, the script prints the equivalent brew commands for you to run.

kubectl is the one exception: its apt repo needs signing-key setup, so the script tells you to install it from kubernetes.io/docs/tasks/tools/.

2. Get cluster access¶

A kubeconfig with get,list nodes on every cluster you want to inventory. Ask whoever currently provisions kubeconfigs.
An SSH key authorised on the cluster nodes. Currently nodes accept logins from your personal Hetzner-side user (e.g. jan); once playbooks/bootstrap-ansible-user.yml has been run, also as ansible.

3. Configure your local copy¶

git clone <repo>
cd ansible
make configure

make configure is interactive — it lists the kubectl contexts it found, the SSH keys under ~/.ssh/, and writes config.yml (gitignored, mode 0600). Ansible and the helper scripts read it automatically; no source step needed.

4. Verify¶

make doctor

Should print mostly [OK]. Investigate every [FAIL] before continuing — the hints under each line tell you what to fix. [WARN]s are non-blocking.

5. First contact¶

make graph                  # show the cluster + groups (queries kubectl)
make ping                   # SSH-to-every-node round-trip

Inventory is dynamic — inventory/k8s-nodes.sh calls kubectl get nodes on every ansible run, so nothing to refresh manually.

If make ping returns SUCCESS for every host, you're done. Try:

make system-info            # OS, kubelet version, disk usage — read-only

6. (Optional) Switch to the dedicated ansible user¶

So you don't need -K for every privileged playbook, and so Semaphore can run playbooks unattended:

# one-time: generate the shared key, commit the .pub, store the private half
# in your team's secret manager
ssh-keygen -t ed25519 -f files/ansible-team -C 'ansible@gem-cluster'

# one-time per cluster: run as a human with sudo password, creates the user
make bootstrap-user         # internally: ansible-playbook ... -K

After that, re-run make configure and pick ansible as the SSH user when prompted, and point the private-key path at the team-shared key you generated in step 1 (files/ansible-team). The new values land in config.yml and are picked up automatically.

What does what¶

You want to …	Run
See every target	`make help`
Re-validate the local setup	`make doctor`
See the current node list	`make graph`
Just check connectivity	`make ping`
Install baseline packages + motd	`make node-baseline`
Update apt packages (no reboot)	`make apt-update`
Update apt + reboot when needed	`ansible-playbook playbooks/apt-update.yml -e auto_reboot=true`
Update apt with dist-upgrade	`ansible-playbook playbooks/apt-update.yml -e dist_upgrade=true`
Target a single host	`--limit gem-c01-cp01`
Target only k8s nodes (no bastion)	`--limit gem_cluster_01` (bastion lives in its own top-level group now)
Dry-run a change	`--check --diff`

Where to put new automation¶

A new playbook — playbooks/<name>.yml. Add a corresponding make target if it'll be run often.
A new role (reusable steps, defaults, templates) — roles/<role>/{tasks,defaults,templates}/. Use roles/node_baseline/ as a starting template.
A new cluster — append its kubectl context to kube_contexts in config.yml and create inventory/group_vars/<context_slug>.yml (dashes → underscores). Add its bastion to inventory/00-static.yml under bastion and reference it via bastion_inventory_name in the new group_vars file.