diff --git a/lessons/workflow/measure-before-optimize.md b/lessons/workflow/measure-before-optimize.md index db388085..76acf2de 100644 --- a/lessons/workflow/measure-before-optimize.md +++ b/lessons/workflow/measure-before-optimize.md @@ -7,6 +7,11 @@ match: - this looks complex so it must be slow - pytest --profile - premature optimization + - claiming latency win without measurement + - PR description promises speedup + - expected speedup not verified + - context size is the bottleneck + - skipping context will speed this up status: active --- @@ -24,6 +29,8 @@ Observable signals indicating need for measurement: - Assuming slowness based on complexity without measurement - Building caching solutions before confirming problems exist - Unable to articulate specific performance metrics that need improvement +- Writing "should drop to Xs" in a PR description or issue comment without a baseline number +- Pointing at one component (prompt size, import time, N+1) as "the" bottleneck without having compared it to other candidates ## Pattern Measure first, then decide based on data: @@ -49,6 +56,21 @@ pytest tests/test_lessons*.py --profile --durations=10 # No profiling data to support the need ``` +**Anti-pattern**: Latency-win claims in PR descriptions / issue comments +```text +# smell: PR body says "simple lookups should drop to ~5-10s" +# but no before/after timing is shown, and the target number +# is anchored on one component (prompt size) rather than on +# the actual cost model (turns × tok/s + startup + tool-use). + +# Correct shape: +# - show baseline (e.g. "currently p50 = 42s over N calls") +# - explain cost model and which component dominates +# - project delta from the dominant component, not from the +# smallest one +# - after landing, post measured delta — not re-projected one +``` + ## Outcome Following this pattern leads to: - **Evidence-based decisions**: Optimize what actually matters