Skip to content

feat(domain): add cluster entity, agent, cli, and web ui for kubernetes management#566

Open
arielshad wants to merge 26 commits into
mainfrom
feat/cluster-agent
Open

feat(domain): add cluster entity, agent, cli, and web ui for kubernetes management#566
arielshad wants to merge 26 commits into
mainfrom
feat/cluster-agent

Conversation

@arielshad
Copy link
Copy Markdown
Contributor

Summary

Introduces a new first-class Cluster domain entity alongside Repository, Application, and Feature. A Cluster represents a managed Kubernetes environment (k3s-in-Docker) that the Shep platform provisions and operates autonomously. This is an XL feature spanning all architecture layers — domain model through web UI.

What's included

  • TypeSpec domain model: Cluster entity (extending SoftDeletableEntity), ClusterStatus enum (6 lifecycle states), ClusterRepository and ClusterApplication junction entities for many-to-many relationships
  • Persistence layer: 4 SQLite migrations (clusters table, junction tables, feature flag column), cluster mapper with camelCase-to-snake_case conversion, full SQLiteClusterRepository implementation
  • Application layer: 12 use cases (CRUD, link/unlink repos and apps, provision, destroy, status), 4 port interfaces (IK3dService, IKubectlService, IArgoCDService, IDockerHealthService)
  • Infrastructure services: K3d, Kubectl, ArgoCD, and DockerHealth service implementations wrapping CLI tools via execFile
  • Cluster agent: LangGraph state machine with 6 nodes (prerequisite-check, provision, configure-kubectl, install-argocd, health-check, ready), detached worker process with heartbeat reporting
  • CLI: shep cluster command group with 7 subcommands (new, ls, show, del, link, unlink, status)
  • Web UI: ClusterNode canvas component, ClusterStatusBadge, ClusterCreateDrawer, ClusterDetailDrawer with tabs, sidebar navigation (gated by feature flag), SSE event support, server actions, API routes
  • Feature flag: clusters flag wired through all layers (TypeSpec, migration, mapper, repository, settings defaults, web feature-flags, translations in all 9 locales)
  • DI wiring: New register-cluster.ts module registering all cluster components in the tsyringe container
  • Full test coverage: Unit tests for all use cases, mappers, factories, services; integration tests for SQLite repository and migrations

Key design decisions

  • k3d for provisioning (not raw Docker + k3s) — eliminates Docker orchestration code
  • Single-node clusters in v1 — multi-node deferred, nodeCount field reserved
  • Junction tables for many-to-many relationships (follows work_item_relations pattern)
  • ArgoCD opt-in per cluster — avoids overhead for simple dev clusters
  • Terraform deferred to v2 — core value is k8s management, not cloud IaC
  • Feature flag gates UI only — domain model and persistence always present

Evidence

Artifact Description
Unit tests 524 test files, 6735 tests passing
Integration tests 59 test files, 793 tests passing
Typecheck tsc --noEmit passes with zero errors
Build pnpm build succeeds
Lint eslint --max-warnings 0 passes
CLI commands Full cluster workflow (create, list, show, status, delete)
Settings flags Clusters feature flag in settings
Status badges All 6 ClusterStatus states in Storybook
Canvas Control center canvas with cluster integration

Test plan

  • pnpm typecheck passes (zero errors)
  • pnpm lint passes (zero warnings)
  • pnpm build succeeds
  • pnpm test:unit — 6735 tests pass across 524 files
  • pnpm test:int — 793 tests pass across 59 files
  • CI pipeline validates all checks
  • Manual verification: shep cluster new test-cluster --argocd creates cluster entity
  • Manual verification: Settings > Flags shows Clusters toggle
  • Manual verification: Storybook renders all cluster components

🐑 Built with Shep.bot

arielshad and others added 23 commits April 16, 2026 00:28
Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
…ster-agent

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
…r-agent

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
…ships

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Create four migrations for the cluster persistence layer:
- 085: clusters table with unique partial slug index and status index
- 086: cluster_repositories junction table with compound unique index
- 087: cluster_applications junction table with compound unique index
- 088: feature_flag_clusters column on settings table

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Implement cluster.mapper.ts with ClusterRow interface, toDatabase and
fromDatabase functions mapping all Cluster entity fields including
boolean-to-integer and Date-to-unix-ms conversion. Unit tests cover
all field mappings and round-trip preservation.

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
…entation

Define IClusterRepository output port with CRUD, junction table management
for cluster-repository and cluster-application many-to-many relationships.
Implement SQLiteClusterRepository with parameterized queries, mapper
integration, and soft-delete support. Integration tests validate all 12
methods against in-memory SQLite including idempotent migrations, slug
uniqueness, status filtering, and junction table operations.

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Add infrastructure port interfaces (IK3dService, IKubectlService,
IArgoCDService, IDockerHealthService, IClusterAgentProcessService) and
implement all 12 cluster use cases: CRUD (Create, Get, List, Update,
Delete), junction management (LinkRepository, UnlinkRepository,
LinkApplication, UnlinkApplication), and lifecycle operations
(ProvisionCluster, DestroyCluster, GetClusterStatus).

All 45 unit tests pass with mocked port dependencies.

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Implement four CLI-wrapping infrastructure services and error classes
for the cluster agent feature:

- K3dService: wraps k3d CLI for cluster lifecycle (create, delete,
  start, stop, status, kubeconfig extraction)
- KubectlService: wraps kubectl CLI for manifest apply, pod/service
  listing, namespace listing, and wait-for-ready with --kubeconfig
  isolation per NFR-2
- ArgoCDService: installs and manages ArgoCD via kubectl apply of
  official manifests (no separate argocd binary required)
- DockerHealthService: checks Docker daemon availability via docker info
- ClusterAgentProcessService: spawns detached worker processes for
  cluster provisioning following the feature-agent-worker pattern
- Custom error classes (K3dError, KubectlError, DockerError) with typed
  error codes for programmatic error handling

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Implement the cluster agent as a LangGraph StateGraph with typed state
channels (ClusterAgentAnnotation), 7 node factories (prerequisite-check,
provision, configure-kubectl, install-argocd, health-check, ready,
handle-error), and a graph factory with conditional ArgoCD routing.

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Implement cluster-agent-worker.ts as a detached Node.js entry point
following the feature-agent-worker pattern. Parses --cluster-id,
--run-id, --argocd-enabled, --argocd-namespace, --resume, --thread-id
args. Initializes DI, resolves infrastructure services, creates the
cluster agent graph with SQLite checkpointer, runs heartbeat, and
handles SIGTERM/uncaughtException gracefully.

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Add clusters boolean to TypeSpec FeatureFlags model and regenerate.
Wire through settings mapper, sqlite repository, defaults factory,
web feature-flags resolver, settings page toggle, and React context.
Add translation strings in all 9 locales. Update FeatureFlags fixtures
across stories, unit tests, and integration tests.

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Create register-cluster.ts with repository, infrastructure services
(k3d, kubectl, argocd, docker-health), agent process service, 12 use
cases, and string-token aliases for web routes. Wire into container.

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Add shep cluster command group with new, ls, show, del, link, unlink,
and status subcommands. Each command resolves use cases from the DI
container and uses i18n for all user-facing strings. Includes
resolve-cluster helper for flexible ID/slug/prefix resolution and
unit tests for new, ls, and del commands.

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Add complete web UI layer for cluster management including:
- ClusterStatusBadge with color/pulse for all 6 statuses
- ClusterNode canvas component with handles and linked counts
- ClusterCreateDrawer with name, description, and ArgoCD toggle
- ClusterDetailDrawer with overview, repos, apps, and status tabs
- 12 server actions for cluster CRUD operations
- 6 API routes for cluster REST endpoints
- useClusterEvents polling hook for live status updates
- Canvas integration via deriveGraph and buildGraphNodes
- Sidebar navigation item gated by featureFlags.clusters
- Cluster i18n translation keys

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Dev Release Published

Artifact Version Install
npm 1.188.0-pr566.655e28f npm install -g @shepai/cli@1.188.0-pr566.655e28f

Published from commit b41f857 | View CI

arielshad and others added 2 commits April 16, 2026 19:12
Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Dev Release Published

Artifact Version Install
npm 1.188.0-pr566.0fb60a1 npm install -g @shepai/cli@1.188.0-pr566.0fb60a1

Published from commit 7e47ccc | View CI

Add a /clusters page that lists clusters, supports create-and-provision
in one click, and exposes provision/destroy/delete actions plus a detail
drawer. Surface docker, k3d, and kubectl as installable tools under a new
clusters tab on the tools page so users can satisfy cluster prerequisites
from the UI. Fill in cluster.* english translations so the create and
detail drawers stop showing raw i18n keys.

Co-Authored-By: Shep Bot <shep-agent@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant