Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
e72beb5
feat(k3s): add ops-mgmt cluster configs and tooling
raisedadead Mar 31, 2026
b0fae18
feat(k3s): add gxy-management galaxy configs and Day 0 spike infrastr…
raisedadead Apr 1, 2026
a564bd6
refactor: consolidate justfiles into root justfile
raisedadead Apr 1, 2026
4ebcc24
feat: consolidate secrets management with ansible-vault
raisedadead Apr 1, 2026
5810c79
feat: add direnv hierarchy and secrets bootstrap workflow
raisedadead Apr 1, 2026
6137073
fix: move scratchpad
raisedadead Apr 2, 2026
c9c1b4e
fix: move archive
raisedadead Apr 2, 2026
b5fc35b
feat(gxy-management): align Day 0 config with spike-plan and ADRs
raisedadead Apr 2, 2026
2332a1c
feat(k8s): add kubeconform manifest validation — local + CI
raisedadead Apr 2, 2026
0619242
fix(k8s): exclude JSON and dashboards from kubeconform validation
raisedadead Apr 2, 2026
9c902c1
feat(cloud-init): update config for Ubuntu 24.04
raisedadead Apr 2, 2026
6ac1504
refactor: migrate secrets from ansible-vault to sops+age
raisedadead Apr 4, 2026
ab0f800
chore: add tailscale justfile recipes and update gxy-management README
raisedadead Apr 4, 2026
8200681
chore: add spike status doc, kubeconfig-sync recipe, fix deploy for TLS
raisedadead Apr 4, 2026
f9ce609
refactor: overhaul justfile and align playbook with sops+direnv
raisedadead Apr 4, 2026
9851893
fix: address review findings — 3 critical, 10 warning, 6 suggestion
raisedadead Apr 4, 2026
c73215d
fix: move Windmill credentials to secret values overlay
raisedadead Apr 4, 2026
3decdca
docs: update SPIKE-STATUS.md with reviewed deployment plan
raisedadead Apr 4, 2026
df3a093
fix: add CIS sysctl prereqs and improve play recipe
raisedadead Apr 4, 2026
7711fe6
refactor: extract galaxy playbook config to Ansible group_vars
raisedadead Apr 4, 2026
5bfa405
fix: reorder playbook — install Cilium before waiting for nodes Ready
raisedadead Apr 4, 2026
edf6521
fix: use variable_host in play names (available at parse time)
raisedadead Apr 4, 2026
e9be94d
fix: set KUBECONFIG for Helm and kubectl in cilium role
raisedadead Apr 4, 2026
270a614
fix: address review findings — etcd backups, Cilium idempotency, secu…
raisedadead Apr 4, 2026
e855771
fix: add KUBECONFIG to gxy-management .envrc
raisedadead Apr 4, 2026
deed8d4
fix: remove manual Gateway API CRD install — conflicts with Traefik C…
raisedadead Apr 4, 2026
c0f2264
feat: add k3s cluster reset playbook
raisedadead Apr 5, 2026
e1d2016
refactor: move k3s flags from extra_server_args to server_config_yaml
raisedadead Apr 5, 2026
7ac23bd
fix: address adversarial review findings (W1-W4)
raisedadead Apr 5, 2026
3d3179f
fix: reset local cleanup delegation + kubeconfig context naming
raisedadead Apr 5, 2026
cfba341
refactor: rewrite galaxy + reset playbooks from standard patterns
raisedadead Apr 5, 2026
7cbd117
chore: remove SPIKE-STATUS.md
raisedadead Apr 5, 2026
51362a7
fix: flush stale Cilium iptables chains in reset playbook
raisedadead Apr 5, 2026
f894482
fix: reset playbook — add gather_facts, fix ansible_user_id, verify c…
raisedadead Apr 5, 2026
adafa0f
refactor(ansible): rename play-k3s--galaxy to play-k3s--bootstrap
raisedadead Apr 5, 2026
44706c7
fix(cilium): add bpf.masquerade, increase timeout, add retries
raisedadead Apr 5, 2026
44b6277
fix(k3s): disable kube-proxy replacement, fix kubeconfig write
raisedadead Apr 5, 2026
d47b359
fix(windmill): remove unused Opaque secretGenerator
raisedadead Apr 5, 2026
bacbf5c
fix(cilium): pin devices/MTU, disable metrics-server
raisedadead Apr 5, 2026
10adb2a
fix(k3s): re-enable metrics-server with hostNetwork workaround
raisedadead Apr 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .envrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
SECRETS_DIR="${SECRETS_DIR:-$(expand_path ../infra-secrets)}"

use_sops() {
local path="$1"
local type="${2:-dotenv}"
if [ -f "$path" ]; then
local decrypted
decrypted=$(sops -d --input-type "$type" --output-type "$type" "$path" 2>&1) || {
log_error "sops decrypt failed for $path"
return
}
eval "$(echo "$decrypted" | direnv dotenv bash /dev/stdin)"
watch_file "$path"
fi
}

if [ -d "$SECRETS_DIR" ]; then
use_sops "$SECRETS_DIR/global/.env.enc"
else
log_error "infra-secrets repo not found at $SECRETS_DIR"
log_error "Clone it: git clone [email protected]:freeCodeCamp/infra-secrets.git ../infra-secrets"
fi

dotenv_if_exists .env

if [ -d ansible/.venv ]; then
PATH_add ansible/.venv/bin
fi
35 changes: 35 additions & 0 deletions .github/workflows/k8s--validate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: K8s -- Manifest Validation

on:
push:
branches:
- main
pull_request:
branches:
- main
workflow_dispatch:

permissions:
contents: read

jobs:
validate:
name: K8s -- Manifest Validation
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4

- name: Install just
run: |
curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh | bash -s -- --to /usr/local/bin
just --version

- name: Install kubeconform
run: |
curl -sL https://github.com/yannh/kubeconform/releases/download/v0.7.0/kubeconform-linux-amd64.tar.gz | tar xz -C /usr/local/bin
kubeconform -v

- name: Validate K8s manifests
run: just k8s-validate 1.32.0
7 changes: 5 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ terraform.rc
.vscode/

# Ignore User-specific temporary files
__scratchpad__/
.scratchpad

# Ignore generated files
manifest.json
Expand All @@ -48,7 +48,6 @@ ansible/inventory/hosts
# Secrets
*.env
*.env.*
.envrc

.kubeconfig.yaml
*.crt
Expand All @@ -61,3 +60,7 @@ secrets.overrides.yaml
o11y/defaults/

.beads-credential-key

# Beads / Dolt files (added by bd init)
.dolt/
*.db
File renamed without changes.
File renamed without changes.
2 changes: 2 additions & 0 deletions ansible/ansible.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ inventory = ./inventory
home = ./.ansible
collections_path = ./.ansible/collections:./roles
roles_path = ./.ansible/roles:./roles
# Secrets managed via sops+age in the infra-secrets private repo
# Env vars loaded via direnv; vault vars via community.sops collection when needed

[inventory]
enable_plugins = yaml, ini, toml, community.general.linode, community.digitalocean.digitalocean
Expand Down
1 change: 1 addition & 0 deletions ansible/inventory/digitalocean.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
plugin: community.digitalocean.digitalocean
api_token: "{{ lookup('ansible.builtin.env', 'DO_API_TOKEN') }}"
attributes:
- id
- name
- tags
- networks
Expand Down
42 changes: 42 additions & 0 deletions ansible/inventory/group_vars/gxy_mgmt_k3s.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
# gxy-management galaxy configuration
# Applied automatically when targeting the gxy_mgmt_k3s inventory group
#
# To add a new galaxy: create a new file matching the DO inventory tag.

galaxy_name: gxy-management
k3s_version: v1.34.5+k3s1
cilium_cluster_id: 1

# k3s config.yaml — written to /etc/rancher/k3s/config.yaml by the role
# Keys are hyphenated, matching CLI flags. Docs:
# https://docs.k3s.io/installation/configuration
# https://docs.k3s.io/security/hardening-guide
#
# Do NOT add tls-san, token, cluster-init, or server here —
# those are managed by the k3s-ansible role via extra_server_args.
server_config_yaml: |
flannel-backend: "none"
disable-network-policy: true
# kube-proxy replacement disabled — breaks etcd on k3s HA (see field-notes Failure 7)
# Cilium still provides CNI + network policies + Hubble without it
# Revisit on bare metal where performance matters
disable-kube-proxy: false
cluster-cidr: "10.1.0.0/16"
service-cidr: "10.11.0.0/16"
protect-kernel-defaults: true
secrets-encryption: true
kube-apiserver-arg:
- "admission-control-config-file=/etc/rancher/k3s/pss-admission.yaml"
- "audit-log-path=/var/log/k3s/audit.log"
- "audit-policy-file=/etc/rancher/k3s/audit-policy.yaml"
- "audit-log-maxage=30"
- "audit-log-maxbackup=10"
- "audit-log-maxsize=100"
etcd-s3: true
etcd-s3-endpoint: "fra1.digitaloceanspaces.com"
etcd-s3-bucket: "net.freecodecamp.universe-backups"
etcd-s3-folder: "etcd/gxy-management"
etcd-s3-region: "fra1"
etcd-snapshot-schedule-cron: "0 */6 * * *"
etcd-snapshot-retention: 20
63 changes: 0 additions & 63 deletions ansible/justfile

This file was deleted.

Loading
Loading