Skip to content

fix(skills): SDD top-tier dispatches inherit the session model instead of naming one from recall#1738

Open
snvtac wants to merge 1 commit into
obra:devfrom
snvtac:sdd-inherit-session-model
Open

fix(skills): SDD top-tier dispatches inherit the session model instead of naming one from recall#1738
snvtac wants to merge 1 commit into
obra:devfrom
snvtac:sdd-inherit-session-model

Conversation

@snvtac

@snvtac snvtac commented Jun 11, 2026

Copy link
Copy Markdown

Targets dev. Complementary to #1719 — different mechanism, independently mergeable. If maintainers prefer a single PR, this paragraph can fold into #1719 and this one can close.

Who is submitting this PR? (required)

Field Value
Your model + version Claude Fable 5 (claude-fable-5)
Harness + version Claude Code 2.1.173 (Linux)
All plugins installed superpowers 5.1.0 (claude-plugins-official, f2cbfbe), oh-my-claudecode, vercel, supabase
Human partner who reviewed this diff @snvtac — experienced the session failure, directed the change, read the complete diff (the single paragraph quoted below) before submission

What problem are you trying to solve?

In a real session (Claude Code 2.1.173, session model claude-fable-5, superpowers 5.1.0), the subagent-driven-development controller dispatched review subagents with model: opus — a silent downgrade from the session's Fable 5. My human partner reported exactly this on #1719: "I default to the fable 5 model, but during task execution it switches to Opus 4.8." The root cause is the one #1719 diagnoses: "use the most capable available model" gets resolved from training-data recall, and recall goes stale the moment a new frontier model ships.

What does this PR change?

Adds one paragraph to skills/subagent-driven-development/SKILL.md → Model Selection, directly after the "Architecture, design, and review tasks" line. The complete diff:

Don't pick "most capable" from memory — inherit instead. Your knowledge of the model lineup goes stale the moment a newer frontier model ships, so a model name recalled from training data silently downgrades the dispatch. When your dispatch tool takes an optional model parameter, omit it for these tasks: the subagent inherits the session's model — the one your human partner already chose — and inheritance can never resolve to a stale name. Pass an explicit model only to route mechanical or integration tasks to a cheaper tier, and take the name from what the harness offers, not from recall.

Is this change appropriate for the core library?

Yes. General-purpose and harness-agnostic ("when your dispatch tool takes an optional model parameter"): in Claude Code the Task tool documents omit-as-inherit ("If omitted, … inherits from the parent"); harnesses with the same affordance get the same benefit; harnesses without it fall through to the existing tier guidance unchanged. No third-party tools or services.

What alternatives did you consider?

Does this PR contain multiple unrelated changes?

No — one paragraph, one file.

Existing PRs

Environment tested

Harness (e.g. Claude Code, Cursor) Harness version Model Model version/ID
Claude Code (authoring session) 2.1.173 Claude Fable 5 claude-fable-5
Claude Code (eval subagents) 2.1.173 Claude Haiku 4.5 claude-haiku-4-5
Claude Code (eval subagents) 2.1.173 Claude Sonnet 4.6 claude-sonnet-4-6

New harness support (required if this PR adds a new harness)

Not applicable — no new harness.

Evaluation

  • Initial prompt: my human partner reported the session failure (Fable 5 session, SDD dispatching Opus reviewers — the same report as their comment on fix(skills): account for Claude Fable in model selection and testing guidance #1719) and asked for a fix and a PR.
  • Methodology: fresh Claude Code subagents acting as SDD controllers, given the Model Selection section verbatim (baseline vs. with this paragraph) plus a realistic Task-tool schema excerpt — the model parameter description including its omit-inherits sentence, listed among the other parameters the way a real controller sees it — and the session stated as claude-fable-5. Scenarios: (A) model list visible, (B) no list visible, (C) list visible + sunk-cost and authority pressure ("three opus dispatches worked great" + human says "don't overthink it, just dispatch"). Run on Haiku 4.5 and Sonnet 4.6. 18 runs total: 8 baseline RED, 2 GREEN on the failing cell, 2 regression, plus 6 discarded early baseline runs (see honest-headline bullet).
  • Honest headline: I could NOT reproduce fix(skills): account for Claude Fable in model selection and testing guidance #1719's 0/6 baseline. Realistic-framing baseline failed only 1/8 in my environment — current Claude Code's own system prompt names Fable, so my controllers were not lineup-stale the way fix(skills): account for Claude Fable in model selection and testing guidance #1719's were. The single baseline failure was Sonnet under pressure, which picked opus (1/3 across repeats of that cell). The 6 discarded early runs narrated the omit-inherits semantics in the scenario prose instead of burying them in the schema — baseline passed 6/6 there, which over-salient framing invalidates as a RED but is itself informative: what fixes this failure is making inheritance salient at dispatch time, which is exactly what this paragraph does inside the skill.
  • Outcome change (before → after): the pressure cell went from 1/3 wrong (baseline, opus) to 2/2 correct with the paragraph (one run omitted the parameter as instructed; one passed fable explicitly — correct outcome, partial mechanism adoption). Regression: a mechanical 1-file task still routes to haiku 2/2 with the new text.
  • What this PR is claiming, calibrated to that data: for current-gen controllers in current Claude Code the paragraph is belt-and-suspenders — it closes the residual pressure failure. The structural value is for the next lineup turnover, when fable becomes the stale name and every memory-resolved dispatch silently downgrades again. Omit-and-inherit does not rot, and it is the only variant that closes the no-list case without requiring lineup knowledge.

Rigor

  • If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing (RED-GREEN against watched baseline failures per its core principle; scenario C combines sunk-cost and authority pressure; results above)
  • This change was tested adversarially, not just on the happy path
  • I did not modify carefully-tuned content (Red Flags table, rationalizations, "human partner" language) — the change is one additive paragraph; the tier structure and all existing wording are untouched

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

The complete diff is the single paragraph quoted above. My human partner (@snvtac) read it in-session, was offered the alternative of commenting on #1719 instead, and explicitly chose PR submission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant