Skip to content

Latest commit

 

History

History
198 lines (149 loc) · 6.9 KB

File metadata and controls

198 lines (149 loc) · 6.9 KB

Notifications (notify)

How to configure outbound alerts for pipeline start, success, and error events, and how to test channels without running a pipeline.

kzero sends HTTP POSTs in run.mode: live only (down, up, reset). dry-run does not fire pipeline notifications. Use kzero notify test to verify wiring at any time.

Quick test (no cluster mutations)

  1. Enable at least one channel in your profile (or via KZERO_* env vars below).
  2. Run:
kzero notify test --config /path/to/kzero.yaml
  1. Expect exit 0 and stdout: notify test: sent event "notify.test" to enabled channel(s).

Preview real event formatting:

kzero notify test -c /path/to/kzero.yaml --event pipeline.start
kzero notify test -c /path/to/kzero.yaml --event pipeline.success
kzero notify test -c /path/to/kzero.yaml --event pipeline.error

pipeline.error in test mode includes sample failed_step and error fields so you can check Slack/Teams layout before a real failure.

YAML schema

notify:
  on_error: true          # default true when any channel is enabled
  slack:
    enabled: false
    webhook_url: ""
  discord:
    enabled: false
    webhook_url: ""
  teams:
    enabled: false
    webhook_url: ""
  pagerduty:
    enabled: false
    routing_key: ""       # Events API v2 integration key
  webhook:
    enabled: false
    url: ""
    headers: {}           # optional extra HTTP headers

Annotated reference: configs/kzero.sample.yml.

Events

Event When (live pipelines)
pipeline.start After the Kubernetes target: block, before the first hook or step
pipeline.success After successful post-down / post-up / full reset
pipeline.error On fail-fast, before hooks.on-error (includes step ref when available)
notify.test Only via kzero notify test (not sent by pipelines)

Set notify.on_error: false to suppress pipeline.error while keeping start/success.

Channel examples

Generic webhook (JSON body)

Best for custom integrations, n8n, or internal receivers. The webhook channel sends the full structured payload:

{
  "event": "pipeline.success",
  "command": "reset",
  "mode": "live",
  "client_id": "ops-team-a",
  "cluster_name": "example-cluster",
  "started_at": "2026-06-05T12:00:00Z",
  "duration": "18m32s"
}
notify:
  webhook:
    enabled: true
    url: "https://hooks.example.com/kzero"
    headers:
      Authorization: "Bearer ${TOKEN}"   # expand in your secret manager / env before deploy

Slack incoming webhook

notify:
  slack:
    enabled: true
    webhook_url: "https://hooks.slack.com/services/T…/B…/…"

Slack receives a colored attachment with a vertical bar per event: blue (started), green (completed), red (error), yellow (test). The title follows kzero {action}; the attachment footer is kzero vX.Y.Z from the running binary (build metadata). Bullet fields include Cluster, Client, Time, Context, User, Mode, and Duration on success. Slack also adds its own incoming-webhook attribution below that footer.

Discord webhook

notify:
  discord:
    enabled: true
    webhook_url: "https://discord.com/api/webhooks/…"

Microsoft Teams workflow / connector URL

notify:
  teams:
    enabled: true
    webhook_url: "https://….webhook.office.com/…"

PagerDuty Events API v2

notify:
  pagerduty:
    enabled: true
    routing_key: "your-integration-key"

Errors trigger with error severity; start/success use info.

Environment overrides (KZERO_*)

Viper prefix KZERO_ with dots and dashes mapped to underscores (same as other config keys):

Env var YAML key
KZERO_NOTIFY_ON_ERROR notify.on_error
KZERO_NOTIFY_SLACK_ENABLED notify.slack.enabled
KZERO_NOTIFY_SLACK_WEBHOOK_URL notify.slack.webhook_url
KZERO_NOTIFY_DISCORD_ENABLED notify.discord.enabled
KZERO_NOTIFY_DISCORD_WEBHOOK_URL notify.discord.webhook_url
KZERO_NOTIFY_TEAMS_ENABLED notify.teams.enabled
KZERO_NOTIFY_TEAMS_WEBHOOK_URL notify.teams.webhook_url
KZERO_NOTIFY_PAGERDUTY_ENABLED notify.pagerduty.enabled
KZERO_NOTIFY_PAGERDUTY_ROUTING_KEY notify.pagerduty.routing_key
KZERO_NOTIFY_WEBHOOK_ENABLED notify.webhook.enabled
KZERO_NOTIFY_WEBHOOK_URL notify.webhook.url

Example: test Slack without editing YAML on disk:

export KZERO_NOTIFY_SLACK_ENABLED=true
export KZERO_NOTIFY_SLACK_WEBHOOK_URL="https://hooks.slack.com/services/…"
kzero notify test --config ./kzero.yaml

Never commit webhook URLs or routing keys. Use env vars, CI secrets, or a secret manager.

Live pipeline workflow

Recommended order for a new profile:

# 1. Plan only
kzero analyze --config ./kzero.yaml

# 2. Verify notify channels (no API mutations)
kzero notify test --config ./kzero.yaml
kzero notify test -c ./kzero.yaml --event pipeline.error

# 3. Dry-run pipeline (no notify, no mutations)
kzero down --config ./kzero.yaml   # run.mode: dry-run

# 4. Live run (notify fires on start / success / error)
export KZERO_RUN_MODE=live
kzero reset --config ./kzero.yaml

See also kzero-selfhosted/run/docs/automation-and-pipelines.md for CI/cron patterns.

Operator audit fields in payloads

When configured, notifications include client_id from client.id and cluster metadata from cluster.name. The Kubernetes target: block also prints os_user / os_uid (hooks receive KZERO_OS_USER / KZERO_OS_UID). See SPECIFICATIONS.md.

Troubleshooting

Symptom Check
No message on down / up run.mode must be live; dry-run skips pipeline notify
notify test: no notify channel enabled At least one *.enabled: true or matching KZERO_NOTIFY_*_ENABLED
HTTP 4xx from webhook URL, auth headers, and firewall egress
Duplicate error alerts pipeline.error fires before on-error hook; hook may send its own alert
Secrets in logs kzero redacts webhook URLs in notify error messages; keep URLs out of committed YAML
Live reset ran but no Slack after API outage v0.7.3: notify POST failures are not logged; no mid-pipeline API watchdog — see pipeline-network-loss.md and plan-0.8.x.md

Contract details: SPECIFICATIONS.mdnotify and kzero notify test.

Production live reset: run kzero notify test --event error before destructive work; tee stdout to a log file; consider an external watchdog — pipeline-network-loss.md.