docs(lessons): extend measure-before-optimize for PR/comment claims#719
Conversation
Adds keywords and an anti-pattern section for the failure mode of writing
latency-win claims ("should drop to ~5-10s") in PR descriptions or issue
comments without a baseline number or cost-model comparison. The existing
lesson covered code-level premature optimization but did not trigger on
the more common failure of anchoring a projected speedup on one component
(prompt size) instead of the dominant cost (turns × tok/s + startup).
Motivation: ErikBjare/bob#651 — shipped #713 claiming
context-skip would drop voice subagent lookups to 5-10s without showing
that prompt-eval was actually the dominant latency source. Correct
response (measurement-first) was #718.
Greptile SummaryPurely additive documentation change to Confidence Score: 5/5Safe to merge — documentation-only, no code changed, all additions are well-formed and consistent with existing conventions. All changes are additive markdown edits to a single lesson file. No logic, schema, or runtime behavior is affected. No P0/P1 findings identified. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Performance claim in PR/comment] --> B{Is there a baseline measurement?}
B -- No --> C[❌ Anti-pattern: unanchored claim\ne.g. 'should drop to ~5-10s']
B -- Yes --> D{Is cost model documented?}
D -- No --> E[❌ Anti-pattern: single-component anchor\ne.g. only prompt size considered]
D -- Yes --> F{Post-merge: measured delta posted?}
F -- No --> G[❌ Anti-pattern: re-projected estimate instead of data]
F -- Yes --> H[✅ Correct shape:\nbaseline → cost model →\ndominant-component projection →\nmeasured delta]
Reviews (1): Last reviewed commit: "docs(lessons): extend measure-before-opt..." | Re-trigger Greptile |
Summary
Adds keywords and an anti-pattern section to
measure-before-optimizecovering the failure mode of writing latency-win claims (e.g. "should drop to ~5-10s") in PR descriptions or issue comments without a baseline number or cost-model comparison.The existing lesson covered code-level premature optimization (caching, profiling
pytest), but did not trigger on the more common agent failure: anchoring a projected speedup on one component (prompt size) instead of the dominant cost (turns × tok/s + startup + tool-use).Motivation
ErikBjare/bob#651 — in #713 I wrote "mode=fast now skips --context files entirely → simple lookups should drop to ~5-10s" without showing that prompt-eval was actually the dominant latency source. Erik pushed back: "20k tokens is not a dominating latency source, the total number of steps/turns in the workflow and the tok/s of the model is probably the main driver." He was right.
The measurement-first response was #718, which added per-stage timing (
dispatch->spawn,spawn->first_output,first_output->done,quiet_tail) to the voice subagent bridge. Claims about where time goes can now be evidence-based.Changes
claiming latency win without measurement,PR description promises speedup,expected speedup not verified,context size is the bottleneck,skipping context will speed this up.Test plan
python3 gptme-contrib/packages/gptme-lessons-extras/src/gptme_lessons_extras/validate.py lessons/workflow/measure-before-optimize.mdpassesRule,Context,Outcome, orRelatedsections — additive only