software factory: explore running it only using Claude Code + Skills + Boxel CLI#4843
software factory: explore running it only using Claude Code + Skills + Boxel CLI#4843jurgenwerk wants to merge 20 commits into
Conversation
Documents the prompt sequence a user runs in a subscription-billed Claude Code session to drive a full factory run without the SDK orchestrator. Tracks the gaps that must be closed first: missing `boxel lint` / `parse` / `test` CLI commands, skill rewrites to drop factory-MCP-tool references, a new scheduling skill encoding the issue-pickup rules currently in `issue-scheduler.ts`. CS-11149. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lints every lintable (.gts/.gjs/.ts/.js) file in a realm via the realm `_lint` endpoint, or a single file when a realm-relative path is passed. Aggregates per-file violations into a single summary; exits non-zero on any error-severity violation. This is the realm-wide companion to the existing single-file `boxel file lint <path>` command. Closes the first gap from the Phase 1 runbook (CS-11149): the software factory's `runLintInMemory` validator becomes reachable from an interactive Claude Code session via Bash. `software-factory` keeps its in-process `runLintInMemory` for now; both coexist during the migration window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts the factory's glint runner + JSON document validator from `packages/software-factory/src/parse-execution.ts` into a top-level `boxel parse` command in boxel-cli. Behavior matches the existing factory tool: - Without a path: discovers every `.gts` / `.gjs` / `.ts` in the realm plus every `.json` file linked as a `Spec.linkedExamples`, runs glint (`ember-tsc`) over the GTS batch in a temp dir with monorepo-aware tsconfig paths, and validates the document structure of each JSON example. - With a path: parses just that single file (GTS → glint, JSON → document validation). Path resolution is anchored on this file's `__dirname`, so the command requires the Boxel monorepo layout — `packages/base`, `packages/host`, `packages/boxel-ui`, and `@glint/ember-tsc` (added as a boxel-cli devDependency) must all be resolvable. This is a factory-developer tool, not an end-user CLI feature. The factory keeps its own copy of `parse-execution.ts` for now; both coexist during the CS-11149 migration window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts the factory's in-memory QUnit runner (`runTestsInMemory`) from
`packages/software-factory/src/test-run-execution.ts` into a
top-level `boxel test` command in boxel-cli. The runner:
- Discovers every `*.test.gts` file in the realm.
- Locates the host app's compiled `dist/` (env override, sibling
packages/host, or the root checkout when in a git worktree).
- Spins up a tiny HTTP server that serves the host's test bundles
+ a synthesized QUnit harness page with live-test enabled.
- Drives a headless Chromium against that page with the realm URL
in the query string; injects the per-realm JWT (if the active
profile has one) via `page.route()` so private realms can be
reached.
- Collects per-test QUnit results via `QUnit.on('testEnd' / 'runEnd')`
hooks and aggregates them into pass/fail/skip counts + per-failure
details.
Unlike the factory's `executeTestRunFromRealm`, this command does
NOT create or update a TestRun card — results are returned in-memory
only. Card persistence is the agent's responsibility in the new
Phase 1 flow.
`@playwright/test` is added as a boxel-cli devDependency. The
`findHostDistPackageDir` discovery helper is inlined from
`@cardstack/realm-test-harness/host-dist` to avoid pulling the
harness in as a dependency. Like `boxel parse`, this is a
monorepo-only command and not usable from the published CLI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The three validator CLI commits (`boxel lint`, `boxel parse`, `boxel test`) closed the first chunk of the runbook's gaps section. This commit brings the runbook in line with what was actually built: - Capability table no longer marks the three commands as missing. - Gaps section is rewritten to document them as landed, with the monorepo-only caveats for parse and test (host dist + glint paths are resolved relative to the CLI), and the deferred realm-server `_parse` endpoint mentioned as a separate Phase 2 ticket. The remaining gaps section (skill rewrites) is unchanged — that is the next slice of work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New skill that captures the loop-control logic the orchestrator used to own — picking the next unblocked Issue from a target realm, transitioning its status across the lifecycle (`backlog → in_progress → done | blocked`), pushing between flips, and bailing out cleanly when the backlog is exhausted. Lifts the rules from `src/issue-scheduler.ts` (eligibility filter + ordering) and the status-flip choreography from `src/issue-loop.ts` into agent-readable markdown. Also documents how to discover the tracker module URL — previously injected into the agent's system prompt by the orchestrator's `inferDarkfactoryModuleUrl` — and how to assemble per-issue context from the issue's relationships (`project`, `relatedKnowledge`), replacing `factory-context-builder.ts`. Companion to the existing `software-factory-bootstrap` and `software-factory-operations` skills; covers "which issue do I work next and how do I record progress" while those cover "what do I do inside the issue I picked." CS-11149. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the factory-MCP-tool references throughout with the
corresponding `boxel` CLI invocations now that the agent drives the
loop without an orchestrator:
- `run_lint` → `boxel lint [path] --realm <url>`
- `run_parse` → `boxel parse [path] --realm <url>`
- `run_evaluate` → `boxel run-command evaluate-module --realm <url> --input '{"path": "..."}'`
- `run_instantiate` → `boxel run-command instantiate-card --realm <url> --input '{"path": "..."}'`
- `run_tests` → `boxel test --realm <url>`
- `get_card_schema` → `boxel run-command get-card-type-schema --realm <url> --input '{"module":"...","name":"..."}'`
Drops the `Control Flow` section (`signal_done` /
`request_clarification` no longer exist as tools — they are replaced
by status flips on the Issue card, documented in the new
`software-factory-scheduling` skill).
Removes the "never set status to done" rule — the agent now owns
the full status lifecycle. The `description`-is-immutable rule and
the read-before-write rule stay (they're agent-side discipline, not
orchestrator-enforced).
Adds explicit "push the workspace before running validators"
guidance — the new CLI validators all read from the realm, so a
sync step is required between workspace writes and validation.
Updates the Required Flow section to reflect the new control loop:
write → push → validate → fix → push → mark done. Adds cross-links
to `software-factory-scheduling` for status transitions and to the
boxel-cli plugin skills (boxel-api, boxel-command, realm-sync) for
the underlying CLI surfaces. CS-11149.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the bootstrap skill to match the new interactive Claude
Code flow now that the agent drives bootstrap directly:
- Replace every `get_card_schema({module,name})` reference with
the equivalent `boxel run-command get-card-type-schema` Bash
invocation.
- Add a "Creating the target realm" section — previously the
orchestrator created the realm via the entrypoint before the
agent ran; the agent now does this itself via
`boxel run-command create-realm` (or the realm-sync skill's
sugar variant).
- Add a "Discover the tracker module URL" section — previously
injected into the agent's system prompt as
`darkfactoryModuleUrl`; the agent now constructs it from the
target realm's origin (`<origin>/software-factory/darkfactory`)
and confirms by introspecting one of its exports.
- Drop the `signal_done()` reference from the Completion section
and replace with explicit `boxel push` + status flip on the
bootstrap Issue, cross-linking to the
`software-factory-scheduling` skill for the status-transition
rules.
- Add cross-references to the new
`software-factory-scheduling` and existing
`software-factory-operations` skills so the agent knows where
to go after bootstrap completes.
- Use `boxel push` / `boxel pull` explicitly for sync; reference
the `realm-sync` skill for the canonical command. CS-11149.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29c7c40c24
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "@cardstack/postgres": "workspace:*", | ||
| "@cardstack/runtime-common": "workspace:*", | ||
| "@glint/ember-tsc": "catalog:", | ||
| "@playwright/test": "catalog:", |
There was a problem hiding this comment.
Move Playwright to runtime deps or lazy-load the test command
@playwright/test is added under devDependencies, but the CLI eagerly imports ./commands/test during startup and that module has a top-level import { chromium } from '@playwright/test'. In normal npm/global installs, dev dependencies are not installed, so even unrelated commands (for example boxel --help or boxel profile list) will fail at process start with a module-resolution error before argument parsing. This needs either a runtime dependency or deferred import inside the test command path.
Useful? React with 👍 / 👎.
| | Realm creation | `boxel run-command create-realm` (or `boxel realm create`) | | ||
| | Workspace pull / push | `boxel pull` / `boxel push` (realm-sync skill) | | ||
| | Federated search | `boxel search --realm <url> --query '<json>'` (boxel-api skill) | | ||
| | Card-type schema | `boxel run-command get-card-type-schema --realm <url> --input '{module,name}'` | |
There was a problem hiding this comment.
Replace invalid run-command shorthand in factory runbook
The documented invocation boxel run-command get-card-type-schema ... is not a valid run-command specifier for this CLI; run-command expects a full command module reference (for example @cardstack/boxel-host/commands/get-card-type-schema/default). Following the runbook as written will error immediately, which blocks bootstrap and per-issue validation flows that depend on these calls. Update the runbook/skills to use valid specifiers (or a supported alternative command).
Useful? React with 👍 / 👎.
Host Test Results 1 files 1 suites 1h 41m 11s ⏱️ Results for commit 2e31989. Realm Server Test Results 1 files ± 0 1 suites +1 9m 20s ⏱️ + 9m 20s Results for commit 2e31989. ± Comparison against earlier commit 622dc9f. |
Three issues surfaced when running the Phase 1 prompt sequence
against the sticky-note brief in a fresh Claude Code session.
1. **`get-card-type-schema` invocation form was wrong.** The skills
said `boxel run-command get-card-type-schema --input '{"module":
"...", "name": "..."}'`. The realm-server actually wants the
fully-qualified command specifier
`@cardstack/boxel-host/commands/get-card-type-schema/default`
and a `codeRef` wrapper around the module+name:
boxel run-command @cardstack/boxel-host/commands/get-card-type-schema/default \
--realm <url> --input '{"codeRef":{"module":"...","name":"..."}}'
Matches the canonical form documented in the `boxel-command`
plugin skill. Updated everywhere the new skills + runbook show
the command, but kept short prose references ("the schema you
fetched with `get-card-type-schema`") since those are
informational, not copy-pasteable.
2. **Project does not carry a `board` relationship.** The
bootstrap skill instructed populating
`relationships.board → ../Boards/<slug>` on the Project card.
Live schema introspection shows Project has no `board` field;
the link is one-way IssueTracker → Project via the board's own
`project` relationship. Removed the Project-side link from the
bootstrap guidance and added an explicit note about the one-way
direction.
3. **Dev `boxel` CLI is not on PATH during Phase 1.** The test
agent had to discover the dev binary at
`<monorepo>/packages/boxel-cli/bin/boxel.js`, work around a
stale `dist/`, and symlink onto PATH itself. That work
shouldn't be left to improvisation. Added a "First: verify the
`boxel` CLI works" section at the top of the bootstrap skill
documenting the dev-binary location, the two safe ways to use
it (direct `node` invocation or PATH symlink), and the
stale-dist gotcha. Mirrored the setup advice in the runbook
prerequisites. Phase 1 only — Phase 2 packaging will retire
this wrinkle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the skills said "verify `boxel --version` works; if it doesn't, fall back to the dev binary." That left the agent guessing when the test session found `boxel` missing. Make the dev wiring the default and unconditional path during Phase 1: - `software-factory-scheduling` now opens with an idempotent wiring block that derives the monorepo root from `git rev-parse --show-toplevel`, renames any stale `dist/` so the bin shim falls back to ts-node, symlinks `packages/boxel-cli/bin/boxel.js` into `~/.local/bin/boxel`, and prepends `~/.local/bin` to PATH. Verified with `boxel --version` + a check that `boxel --help` lists `lint` / `parse` / `test`. - `software-factory-bootstrap` carries the same wiring block (it is the first skill loaded when working a bootstrap issue directly, before scheduling). Also fixes the realm-creation example: native subcommand `boxel realm create <endpoint> "<display name>"`, not `boxel run-command create-realm`. - `software-factory-operations` adds a brief prerequisite pointing back to scheduling/bootstrap for the wiring block, since every command in the body uses `boxel`. Phase 1 workaround — when boxel-cli ships properly the wiring disappears and `boxel` is just installed globally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second-test-run fix-up. The skills said `boxel pull --realm <url>` and `boxel push --realm <url>` as if those were top-level commands; they aren't. The actual surface is under the `realm` subcommand with positional arguments: - `boxel realm pull <realm-url> <local-dir>` - `boxel realm push <local-dir> <realm-url>` - `boxel realm sync <local-dir> <realm-url>` (with optional --prefer-local / --prefer-remote / --prefer-newest for conflict resolution) Updated every occurrence across the bootstrap, operations, and scheduling skills plus the runbook capability table. Added a concrete example next to each so the agent doesn't have to reverse-engineer the argument order. Also updated the "see also" references to point at the `realm-sync` plugin skill (which documents the full surface). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Third-test-run fix-up. The agent passed `boxel realm create user/factory-test-stickynote-2 "..."` and got `Error: realm name must contain only lowercase letters, numbers, and hyphens` (regex `^[a-z0-9-]+$`). The legacy-paths warning that also printed was unrelated noise — not the actual failure mode — which the agent then chased on its next attempt. The skill said `<realm-endpoint>` was a path segment like `user/my-realm`. That's wrong: the argument is the realm's slug only, and the user-namespace prefix is added by the realm server automatically based on the active profile. Updated the "Creating the target realm" section to: - Call the argument `<realm-name>` (matching the CLI's own help text). - State the regex constraint explicitly. - Show that the agent derives the slug from the target realm URL's final path segment. - Explicitly tell the agent to ignore the legacy-local-realm-dirs warning unless the command itself exits non-zero. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fourth-test-run fix-up. The agent ran the full prompt sequence and
flagged three real bugs in the skill text that I'd glossed over.
1. **evaluate-module input shape** — skill said
`--input '{"path": "..."}'`. The real fields are
`{moduleIdentifier, realmIdentifier}`, both absolute URLs. Also
the command spec must be fully qualified
(`@cardstack/boxel-host/commands/evaluate-module/default`). Updated
the operations skill's validator section with a worked example
using `jq -nc` to build the input JSON; also updated the runbook
capability table and the Required Flow shorthand.
2. **instantiate-card input shape** — skill said
`--input '{"path": "..."}'`. The real fields are
`{moduleIdentifier, cardName, realmIdentifier, instanceData?}`,
and `instanceData` (when provided) must be a JSON string whose
`data.meta.adoptsFrom.module` already matches `moduleIdentifier`
exactly — relative paths like `../sticky-note` are rejected.
Updated the validator section with a worked example that uses
`jq` to rewrite the workspace JSON to absolute URLs before
sending. Same updates to the runbook table and Required Flow.
3. **`loader.import` typing in `.test.gts`** — the example test
used `let { StickyNote } = await loader.import(cardModuleUrl);`
which destructures from a `{}` return type and makes
`boxel parse` fail with a type error on every test file. Fixed
the example to `loader.import<typeof import('./sticky-note')>(cardModuleUrl)`
and added an explicit "Why" callout so future agents don't
re-introduce the bug.
4. **Runbook prerequisites** — added the two one-time setup steps
that have to happen before `boxel test` works: building the
host app dist (`pnpm --filter @cardstack/host build`) and
installing Playwright's chromium binary
(`npx playwright install chromium`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier I omitted `LintResult` / `ParseResult` / `EvalResult` / `InstantiateResult` / `TestRun` cards from the per-issue workflow, reasoning that the agent reads validator output inline so nothing needs to persist. That was the wrong call — the audit trail in the realm's `Validations/` folder is part of the factory's value, not a nice-to-have. Without those cards the host UI loses the sortable history the SDK orchestrator produced. Updated `software-factory-operations`: - Rewrote the "Validators (CLI commands)" preamble: the CLI itself doesn't persist anything; persisting the audit trail is the agent's responsibility. Always run validators with `--json` so the structured result is available to map into the card. - Added a new "Validation artifact cards" section. Documents the five card types and their source-realm module URLs, the `Validations/<type>_<issue-slug>-<n>.json` filename convention, the sequence-number rule (find the highest existing `n` and use `n+1`), the schema-discovery flow (introspect via `get-card-type-schema` before writing), and the document envelope. Calls out that cards are written even on success and not overwritten on retry. - Expanded the Required Flow numbering to interleave a "write the validation card" step between "run the validator" and "push" so the audit trail lands with the rest of the work. - Updated the target-realm artifact-structure tree to include `Validations/` with the five card patterns. Updated the runbook: - Replaced the "Things explicitly NOT in Phase 1 — Validation artifact cards" bullet with a positive "Phase 1 includes validation artifact cards" section pointing at the operations skill's new section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fix-ups from observing the recipe end-to-end run: 1. The runbook's three-prompt structure (Bootstrap / Work next / Continue or finish) was a defensive Phase 1 design — pause between each phase so we could verify each step. The recipe run proved a single prompt drives the whole loop end-to-end without intervention. Collapsed the three prompts into one that bootstraps + works every Issue + completes the project, and rewrote the bootstrap skill's Completion section: it no longer says "stop and report; don't start implementing." It now explicitly hands off to scheduling for in-session continuation. Also retired the "How Phase 2 builds on this" section since the single-prompt form was Phase 2's stated goal; replaced with a follow-up-work checklist (slash command wrapper, `_parse` realm-server endpoint, dev-CLI setup automation, orchestrator retirement). 2. The SDK orchestrator wrote `Issues/bootstrap-seed.json` as a visible "this is where the factory started" anchor in the realm UI. The new flow had been omitting it, leaving a gap in the visual parity. Added a "The bootstrap-seed Issue" section to the bootstrap skill documenting the attribute shape (issueId `<projectCode>-0`, status `done`, issueType `bootstrap`, priority `critical`, order `0`) and the relationships (project + relatedKnowledge links so a future-you reading the seed Issue has the same brief context any other Issue carries). Phase 1's documented validation plan now expects it. Status section in the runbook bumped from "draft / Phase 1 spec" to "working end-to-end on sticky-note and recipe as of 2026-05-15." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The runbook is the user-facing guide for running the factory on any brief; sticky-note was just my dev fixture during the CS-11149 back-and-forth. Removed: - Status banner that named sticky-note / recipe as the briefs the runbook is "working against" — replaced with the existing introduction, which describes the system in general terms. - Prerequisite example URL hardcoded to sticky-note — now uses a `<brief-slug>` placeholder so the reader inserts their own. - "Validation plan" section that named sticky-note as the simplest exerciser and referenced the Playwright suite — replaced with an "Expected output" section that describes the artifacts of a successful run in generic terms. The Playwright suite still exists and tests sticky-note specifically, but that's a CI / contributor concern, not user-facing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "interactive" qualifier was distinguishing this from the SDK orchestrator path during the CS-11149 migration. With the orchestrator scheduled for retirement and this runbook being the canonical (and shortly, only) way to drive the factory, the qualifier is dead weight. No other files referenced the path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the runbook said "fix any failures (incrementing the sequence number on retry)" and the operations skill said "Fix anything that fails and re-run" — both correct but compressed enough that an agent could read "re-run once and move on" instead of "loop until every validator passes." That's a real bug class: a partial-pass run would still mark the Issue done. Both docs now lay out the loop explicitly: - Run each validator, write its artifact card capturing the current result (passed OR failed — the audit trail is the point). - If anything failed, fix the source, push, re-run the failing validators, write NEW artifact cards with the next sequence number (don't overwrite previous ones). - Stop iterating only when every validator's most recent artifact card has `status: "passed"`. - Then mark the Issue done. "Most failed but a few passed, good enough" is not the bar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The validator loop now terminates on success OR on one of three bail-out conditions, matching the kind of safety the SDK orchestrator gave via `maxIterationsPerIssue`. Without these, an agent could spiral on a flaky test or a brief requirement it can't satisfy and burn context indefinitely. Per-issue limits (operations skill, new "Bailing out" section): - 8 total iterations per Issue (matches the orchestrator's old `maxIterationsPerIssue` default). - 3 consecutive failures of the same validator with the same error message — the fix isn't working, stop. - 5 distinct fix attempts on a single validator without a pass — problem is outside this Issue's scope. When any limit hits, the agent: sets the Issue to `blocked`, appends a comment naming the limit hit + the latest failure message verbatim + an enumeration of what was tried (keyed to artifact-card sequence numbers), pushes, and hands back to scheduling for the next eligible Issue. The Project's `projectStatus` is only flipped to `completed` if every Issue ended `done` — `blocked` Issues are left in place so the human can investigate. The runbook prompt now references the bail-out limits inline (without spelling them out) and points at the operations skill for the detail. The scheduling skill's status-transition table cross-references the bail-out section so an agent looking at "when do I flip to blocked?" finds the answer with one hop. Outer-loop safety is implicit: once an Issue is `blocked`, the scheduler skips it, so the loop terminates naturally once every Issue is either `done` or `blocked`. No outer-cycle cap is needed (the SDK orchestrator's 50-cycle cap was a defense against scheduler bugs, not validator spirals). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Gaps blocking Phase 1" section (CLI commands to add / skill content to add / things NOT in Phase 1) was meta tracking from when this doc was the Phase 1 spec — every line of it is now done or stale: - "CLI commands to add" → all three landed (boxel lint, parse, test) - "Skill content to add / rewrite" → all three skills authored - "Phase 1 includes validation artifact cards" → done; covered in the operations skill's "Validation artifact cards" section - "Things explicitly NOT in Phase 1: iteration limits" → wrong as of the previous commit, which added bail-out limits Spec history belongs in git/PR descriptions, not a user-facing runbook. The section is gone. The follow-up-work list survives (it points at genuine in-flight rough edges) and the orchestrator retirement bullet's "Phase 1" framing is gone. Also dropped the "Phase 1 dev setup note" framing in the prerequisites — same setup steps, just no longer labeled as a temporary Phase 1 workaround. (It still IS temporary; the "Follow-up work" section calls that out under "Dev `boxel-cli` setup automation.") No more Phase 1 / Phase 2 references in the runbook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Why we're doing this
Today, running the software factory means running
pnpm factory:go, which spins up a Node process that drives the loop via the Claude Agent SDK. Like an interactive Claude Code session, it bills against your Claude subscription.Starting June 15, 2026, Anthropic is carving Agent SDK usage out of the regular subscription pool — SDK and
claude -pinvocations on subscription plans will draw from a separate, smaller monthly Agent SDK credit, while interactive Claude Code sessions stay uncapped. So the cheapest, most predictable place to run long agentic loops going forward is inside an interactive session you already have open.This PR teaches the software factory to run that way: you paste one prompt into Claude Code, and the agent does the whole run — bootstrap, implement each Issue, validate, mark done, repeat until the project is complete. No SDK process, no separate credit pool.
The existing SDK orchestrator (
issue-loop.ts,factory-agent/, etc.) is not deleted in this PR. The two paths run side by side. We'll retire the orchestrator in a follow-up PR once the interactive path has been used in practice for a while.How to try it
Quick one-time setup:
Then run a factory:
In the Claude Code session, paste:
The agent bootstraps the realm, works each implementation Issue end-to-end, runs all five validators per Issue (with bail-out limits so it doesn't spiral), writes audit-trail cards under `Validations/`, and flips the project to `completed`. Open the target realm in the Boxel host UI to see what got built.
Want to try a richer brief? Substitute `recipe`, `gradebook`, or any of the other briefs in the source realm.
The runbook (`packages/software-factory/docs/runbook.md`) is the canonical reference — the snippet above is just the TL;DR.
What's in the PR
3 new `boxel-cli` commands so the agent can run the validators from its Bash surface:
The other two validators (`evaluate-module`, `instantiate-card`) and the schema fetcher (`get-card-type-schema`) were already host commands reachable via `boxel run-command`.
3 vendor-neutral skills the agent loads when it enters a software-factory session:
A runbook at `packages/software-factory/docs/runbook.md` documenting the user-facing prompt, prerequisites, and what to expect in the target realm.
The agent owns the whole loop: bootstrap, pick next Issue, implement, validate, persist audit cards, mark done — repeat until the backlog is empty. The validator loop has bail-out limits (8 iterations per Issue, or 3 identical consecutive failures, or 5 distinct fix attempts without a pass) so it doesn't spiral on a brief it can't satisfy. If an Issue genuinely can't be made green, it ends up `blocked` with a comment explaining what was tried, and the agent moves on.
What's deliberately not in this PR
Validated against
End-to-end clean runs (single-prompt, no manual intervention) against:
Both produced a populated target realm with the full set of expected artifacts including the `Validations/` folder.
How to review
The skill text under `packages/software-factory/.agents/skills/` is the biggest surface — those three Markdown files are what the agent reads to drive the loop, so they're worth a careful read. The runbook and the three new CLI commands are smaller and self-contained.
🤖 Generated with Claude Code