simple-mitigation

A single Go binary that consumes the per-pod ContentionStream gRPC API (see Mitigation-interface.md), evaluates a CEL policy each tick, and fires one of three mitigation tiers:

Tier	Surface	Timescale	Actuator
`isolate`	cgroup v2 `cpu.max` on co-located aggressors	~100 ms	`pkg/actuators/isolate`
`harvest`	cgroup v2 `cpu.max` on co-located best-effort pods	~100 ms	`pkg/actuators/harvest`
`vertical`	`pods/resize` subresource (cpu requests/limits)	~1 s	`pkg/actuators/vertical`
`horizontal`	`apps/v1.Deployment/scale` subresource	~10 s+	`pkg/actuators/horizontal`

The binary runs as a privileged DaemonSet -- one instance per node. Each instance subscribes only to victim pods on its own node (field selector spec.nodeName=$NODE_NAME), so node-local mitigations are race-free without leader election. Horizontal scale is coordinated K8s-natively via an idempotent /scale patch + a mitigation/horizontal-last-scaled-at cooldown annotation on the Deployment.

See plan-v2-centralized.md for the full design.

Architecture

   victim pod (this node)             mitigation-controller (this node, DaemonSet)
   :7900 ──gRPC stream──▶  scoreclient ──▶ features (rolling window per pod)
                                                    ↓
                                            policy (CEL rules)
                                                    ↓
                                ┌──────────┬──────────┬──────────┐
                                ▼          ▼          ▼          ▼
                            isolate     harvest    vertical   horizontal
                            (cpu.max)  (cpu.max)  (resize)   (scale)

The simulation's three simple control laws (horizontal bang-bang, isolating saturated ramp, harvesting AIMD) are ported to pkg/controllers and validated against simulation/simulation.py. They are not yet driving the per-tick loop — see plan.md for the wiring status.

Repo layout

proto/contention.proto                  vendored wire contract (3 spatial-horizon fields added)
gen/go/contentionpb/                    generated (gitignored) -- run `make proto`
pkg/targets/                            multi-victim config loader
pkg/scoreclient/                        gRPC subscriber w/ reconnect + multi-pod fan-in
pkg/podwatch/                           client-go informer (+ NewLocalNodeWatcher for the DaemonSet)
pkg/features/                           rolling window + spatial/temporal feature computation
pkg/policy/                             CEL env, YAML rule loader, fsnotify hot-reload, engine
pkg/cgroup/                             cgroup v2 path resolution + cpu.max read/write
pkg/actuators/                          shared interface + annotation key constants
pkg/actuators/isolate/                  throttles aggressor pods' cpu.max (fraction or absolute cap)
pkg/actuators/harvest/                  raises best-effort pods' cpu.max to lend victim idle cores
pkg/actuators/vertical/                 patches pods/resize for the victim pod
pkg/actuators/horizontal/               patches deployments/scale for the victim Deployment
pkg/controllers/                        the 3 simple control laws ported from simulation.py (cap / n / h)
pkg/aggregator/                         pluggable Max / Mean / P90 (callable from rules)
pkg/thresholder/                        HI/LO + cooldown state machine (also exposed to CEL via `band`)
cmd/mitigation-controller/              the only binary
deploy/controller/                      DaemonSet, RBAC, ConfigMap (targets + policy)
deploy/victim-sample/                   sample search + profile Deployments

Build

Requires Go 1.23 and protoc. On Debian/Ubuntu:

sudo apt install protobuf-compiler
make deps         # installs protoc-gen-go + protoc-gen-go-grpc
make proto        # generates gen/go/contentionpb/*.pb.go
go mod tidy
make build        # equivalent to `go build ./...`
make test         # runs all unit tests

Build the container image:

make docker-controller

The Dockerfile runs make proto inside the build stage, so docker build works from a fresh clone.

Test

Three layers: Go unit tests, the offline control-law parity oracle, and an in-cluster smoke test.

Go unit tests

make test                          # go test ./... (all packages)
go test ./pkg/controllers/...      # the 3 ported control laws + streaming parity
go test ./pkg/cgroup/...           # cpu.max parse / path resolution
go test ./pkg/policy/...           # CEL compile + cooldown engine

make test needs the generated proto stubs (make proto once after a fresh clone) because several packages import gen/go/contentionpb. pkg/controllers has no proto dependency, so it tests standalone even before make proto.

Control-law parity (offline)

simulation/simulation.py is the reference implementation the Go controllers in pkg/controllers are validated against — the test expectations in pkg/controllers/controllers_test.go were cross-checked against it. Run it to regenerate the sweep/figure PNGs or to re-derive expected values:

cd simulation
pip install numpy scipy matplotlib            # one-time
python simulation.py                          # synthetic signals -> *.png
python simulation.py --data run_data_iter1_ready.json   # against a real Gordion trace

It writes sweep_horizontal.png, sweep_isolating.png, sweep_harvesting.png, and ctrl_reference_run.png, plus a numeric summary to stdout.

In-cluster smoke test

After deploying, confirm the pipeline end to end:

# controller is up, one pod per node, and loaded the policy:
kubectl -n mitigation-system rollout status ds/mitigation-controller
kubectl -n mitigation-system logs -l app.kubernetes.io/name=mitigation-controller --tail=20 | grep "policy reloaded"

# drive contention on a victim, then watch actions fire:
kubectl -n mitigation-system logs -l app.kubernetes.io/name=mitigation-controller -f | grep '"msg":"action"'

# verify a cgroup write landed on an aggressor (isolate) / best-effort pod (harvest):
kubectl -n hotelres get pod <aggressor> -o jsonpath='{.metadata.annotations.mitigation/cpu-max-original}'
kubectl -n hotelres get pod <be-pod>     -o jsonpath='{.metadata.annotations.mitigation/harvest-cpu-max-original}'

You can also exercise the score API alone without the controller — see Smoke test the score API directly.

Note: this repo's CI/dev machine may not have a Go toolchain installed; if go is missing, the parity oracle (Python) still runs and is the primary way control-law changes are validated before pushing.

Default policy (out of the box)

Three rules ship in deploy/controller/configmap.yaml, matching plan-v2-centralized.md Section 5 verbatim:

rules:
  - name: sharp_rising_spike
    when: "k_temporal > 0.3 || k_spatial > 0.3"
    fire:
      - kind: isolate
        params: { throttle_fraction: 0.5, aggressor_selector: "tier=batch" }
      - kind: vertical
        params: { scale_factor: 1.5 }
    cooldown: "30s"
    priority: 100

  - name: sustained_high_p50
    when: "p50_now > 0.5 && persistence_h >= 3 && duration_above_hi_ms >= 2000"
    fire:
      - kind: horizontal
        params: { delta: 1 }
    cooldown: "60s"
    priority: 50

  - name: clean_state
    when: "p50_now < 0.2 && k_temporal < 0.0 && tail_now < 0.5"
    fire:
      - kind: restore
        params: { tier: all }
    cooldown: "60s"
    priority: 10

restore is a meta-action: it fans out to every actuator's Restore(), which reads the mitigation/* annotations on the corresponding object and reverses the most recent action.

CEL vocabulary

All feature fields are top-level identifiers (no wrapper object). Match the field names in features.FeatureVector:

Identifier	Type	Meaning
`target`	string	victim service name
`pod`	string	victim pod name
`p50_now`, `tail_now`	double	latest p50_trend_pred / tail_trend_label
`p50_h`, `tail_h`	list(double)	multi-horizon arrays (empty under a single-horizon predictor)
`horizon_ms`	list(int)	parallel array of horizon offsets
`k_spatial`	double	least-squares slope of p50_h vs horizon_ms
`accel_spatial`	double	mean second-difference of p50_h
`p50_max_horizon_ms`	int	argmax horizon
`persistence_h`	int	count of p50_h entries >= HI_THRESHOLD
`k_temporal`	double	least-squares slope of p50 over the rolling window (per second)
`accel_temporal`	double	mean second-difference over the window
`variance`	double	sample variance over the window
`duration_above_hi_ms`	int	length of the most recent contiguous run above HI_THRESHOLD
`window_size`	int	samples currently in the rolling window
`has_spatial`	bool	`true` iff the latest event populated `p50_horizons`
`model_version`	string	latest event's model_version
`source_kind`	string	latest event's source_kind ("onnx" / "formula" / ...)

Two helper functions are registered:

band(score, lo, hi) string -> "up" / "down" / "stable"
count_at_least(list, threshold) int -> count of list entries >= threshold

Actuator params (`fire[].kind` + `params`)

kind	params
`isolate`	`aggressor_selector` (req), and either `throttle_fraction` (default 0.5, one-shot) or absolute-cap mode: `cap_cores` / `cpu_max_quota_us` (+ `period_us`, `min_quota_us`). Optional `aggressor_namespace`.
`harvest`	`be_selector` (req), `harvest_cores` (req, cores to lend on top of baseline). Optional `be_namespace`, `period_us`, `max_quota_us`.
`vertical`	`scale_factor` (multiplicative) or `target_cpu` (absolute, e.g. `"750m"`). Clamped to `MIN_CPU`/`MAX_CPU`.
`horizontal`	exactly one of `delta` (additive) or `ensure_min` (idempotent floor); optional `min_replicas`/`max_replicas`.
`restore`	meta-kind; fans out to every actuator's `Restore()`.

The absolute-cap mode on isolate and the harvest kind are the actuation surfaces the simulation's isolating (cap) and harvesting (h) controllers drive; see plan.md.

Authoring workflow

Edit data.policy.yaml in the ConfigMap.
Apply: kubectl apply -f deploy/controller/configmap.yaml.
The kubelet remounts the volume; fsnotify in pkg/policy/loader.go triggers engine.Reload within ~1s. Look for policy reloaded in the controller logs.

A typo in a CEL expression is rejected by engine.Reload and the previous rules stay live -- the controller never goes silent on a bad rule.

Default thresholds (explicit)

Env var	Default	Meaning
`TICK_MS`	`100`	per-pod policy evaluation cadence
`STALE_MS`	`1500`	a snapshot older than this is treated as missing
`WINDOW_SIZE`	`20`	rolling-window samples (~2 s at 100 ms cadence)
`HI_THRESHOLD`	`0.5`	what counts as "elevated" for PersistenceH / DurationAboveHiMs
`MIN_CPU` / `MAX_CPU`	`200m` / `4`	vertical resize clamp
`HORIZONTAL_COOLDOWN_SEC`	`30`	cross-node Deployment scale gate
`TARGETS_CONFIG`	`/etc/mitigation/targets.yaml`	mounted from the ConfigMap
`POLICY_CONFIG`	`/etc/mitigation/policy.yaml`	same
`NODE_NAME`	(none)	required; injected via `fieldRef: spec.nodeName`

Deploy

Prerequisite: K8s >= 1.35 (in-place pod resize GA -- see https://kubernetes.io/blog/2025/12/19/kubernetes-v1-35-in-place-pod-resize-ga/), cgroup v2 on every node, and the pod-security.kubernetes.io/enforce=privileged namespace label is honoured (see deploy/controller/namespace.yaml).

Sample victims

kubectl apply -f deploy/victim-sample/namespace.yaml
kubectl apply -f deploy/victim-sample/search.yaml
kubectl apply -f deploy/victim-sample/profile.yaml

Replace the placeholder image: REGISTRY/...:tag lines with your real images. The fields that matter for mitigations to work: named score port 7900, resources.requests == resources.limits, resizePolicy.cpu = NotRequired.

Mitigation controller

Automated (recommended): build-push-deploy.sh does build → push → manifest rewrite → apply → rollout in one shot:

./build-push-deploy.sh --node=node-3                 # build, push to docclabgroup, deploy pinned to node-3
./build-push-deploy.sh --tag=v2 --node=node-3        # custom tag
./build-push-deploy.sh --no-build --node=node-3      # redeploy the current pushed image
./build-push-deploy.sh --help                        # all options (registry, pull-policy, no-push, ...)

It renders a temp copy of deploy/controller/daemonset.yaml with the image/pull-policy set (and pins the DaemonSet to --node via nodeSelector), applies all four manifests, and waits for rollout — the tracked manifest is left untouched, so git pull never conflicts. The manual steps below are the same thing unpacked.

Turn it off + revert: mitigation-off.sh stops the controller and undoes anything it changed — scales Deployments back to their baseline replicas and recreates any pods whose cpu.max was modified (isolate/harvest) so they restart clean:

./mitigation-off.sh --dry-run    # preview what would be reverted
./mitigation-off.sh              # stop + revert
./mitigation-off.sh --purge      # also delete the namespace + cluster RBAC

First make the image reachable by every node (the DaemonSet runs one pod per node). make docker-controller builds simple-mitigation/mitigation-controller:dev into the local image store of the node you built on; the others need it too. Check your runtime with kubectl get nodes -o wide (CONTAINER-RUNTIME column):

# Save once on the build node:
docker save simple-mitigation/mitigation-controller:dev -o /tmp/mc.tar

# --- containerd runtime: import into the k8s.io namespace on each node ---
sudo ctr -n k8s.io images import /tmp/mc.tar
sudo ctr -n k8s.io images ls | grep mitigation

# --- docker runtime: load on each node (e.g. fan out over SSH) ---
for n in node-1 node-2 node-3 node-4; do
  scp /tmp/mc.tar "$n:/tmp/mc.tar" && ssh "$n" 'docker load -i /tmp/mc.tar'
done

(For a real registry instead, set the image: in daemonset.yaml to <registry>/simple-mitigation/mitigation-controller:<tag> and docker push — then no per-node loading is needed.)

Version note: this design targets K8s >= 1.35 (in-place pods/resize GA). On older clusters the controller still runs and isolate / harvest / horizontal work, but the vertical actuator needs pods/resize (alpha 1.27, beta 1.33, GA 1.35) and will error there — drop the vertical fire from the policy on pre-1.33 clusters.

Then apply, in order:

kubectl apply -f deploy/controller/namespace.yaml
kubectl apply -f deploy/controller/rbac.yaml
kubectl apply -f deploy/controller/configmap.yaml
kubectl apply -f deploy/controller/daemonset.yaml

kubectl -n mitigation-system rollout status ds/mitigation-controller
kubectl -n mitigation-system logs -l app.kubernetes.io/name=mitigation-controller --tail=30

Adding a victim service later = single ConfigMap edit:

kubectl -n mitigation-system edit cm mitigation-controller-config
# Policy/targets reload via fsnotify within ~1s; no rollout needed.

Crash-safe state (annotations only)

Every action stamps annotations on its target before the actual write so Reconcile() at startup can find and complete an interrupted apply:

Target	Annotation keys
Aggressor Pod	`mitigation/cpu-max-original`, `mitigation/cpu-max-set-by-node`, `mitigation/cpu-max-set-at`
Best-effort Pod	`mitigation/harvest-cpu-max-original`, `mitigation/harvest-set-by-node`, `mitigation/harvest-set-at`
Victim Pod	`mitigation/cpu-limit-baseline`
Victim Deployment	`mitigation/horizontal-last-scaled-at`, `mitigation/horizontal-baseline-replicas`

No extra storage backend (etcd, Redis, the controller's own CRD) is needed; the API server is the source of truth.

Smoke test the score API directly

Matches the path used during development; no controllers needed.

# terminal 1
kubectl -n hotelres port-forward pod/search-<id> 7900:7900

# terminal 2
grpcurl -plaintext -d '{}' localhost:7900 \
  gordion.contention.ContentionStream/Subscribe

You should see a stream of ScoreEvent JSON objects at roughly 10 Hz, now including p50_horizons / tail_horizons / horizon_ms once the predictor side ships the matching change.

Observability

JSON log/slog on stderr. Every action emits a single line with rule, kind, pod, node, applied, reason, before, after, and err on failure. No Prometheus exporter yet; deliberately out of scope.

Renaming the module

The module path is github.com/coding-workspace/simple-mitigation-1. To change it (e.g. to your real GitHub org):

OLD=github.com/coding-workspace/simple-mitigation-1
NEW=github.com/your-org/your-repo
grep -rl "$OLD" . --include="*.go" --include="*.proto" --include="Makefile" \
  | xargs sed -i "s|$OLD|$NEW|g"
go mod edit -module "$NEW"
make proto && go mod tidy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

simple-mitigation

Architecture

Repo layout

Build

Test

Go unit tests

Control-law parity (offline)

In-cluster smoke test

Default policy (out of the box)

CEL vocabulary

Actuator params (`fire[].kind` + `params`)

Authoring workflow

Default thresholds (explicit)

Deploy

Sample victims

Mitigation controller

Crash-safe state (annotations only)

Smoke test the score API directly

Observability

Renaming the module

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
cmd/mitigation-controller		cmd/mitigation-controller
deploy		deploy
pkg		pkg
proto		proto
simulation		simulation
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
build-push-deploy.sh		build-push-deploy.sh
go.mod		go.mod
go.sum		go.sum
mitigation-off.sh		mitigation-off.sh

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

simple-mitigation

Architecture

Repo layout

Build

Test

Go unit tests

Control-law parity (offline)

In-cluster smoke test

Default policy (out of the box)

CEL vocabulary

Actuator params (fire[].kind + params)

Authoring workflow

Default thresholds (explicit)

Deploy

Sample victims

Mitigation controller

Crash-safe state (annotations only)

Smoke test the score API directly

Observability

Renaming the module

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Actuator params (`fire[].kind` + `params`)

Packages