perf(gptme-voice): skip context loading in fast subagent mode by TimeToBuildBob · Pull Request #713 · gptme/gptme-contrib

TimeToBuildBob · 2026-04-20T18:33:11Z

Summary

Fast subagent latency fix: mode=fast no longer passes --context files to gptme, removing the 20k+ token workspace context load that was the dominant latency source. Simple lookups should drop from ~60s to ~5-10s.
Real-time status: Switched from communicate() to streaming stdout line-by-line so subagent_status can show last_output — the last line the running subagent produced. Addresses feedback from feat(gptme-voice): add subagent_status and subagent_cancel tools #711.
Smart mode is unchanged — still loads full workspace context for queries that need it.

Root cause of the 1-minute latency

--context files triggers gptme to load all files listed in gptme.toml (ABOUT.md, GOALS.md, ARCHITECTURE.md, people/, etc.) — easily 20k+ tokens. For a fast lookup like "what are my active tasks?", that context is unnecessary overhead. Without it, the subagent starts answering almost immediately.

Test plan

27/27 tests pass (5 new tests: test_execute_fast_mode_skips_context_loading, test_execute_smart_mode_keeps_context_loading, test_subagent_status_shows_last_output, and updated mocks for streaming)
Call and verify fast lookups complete in <15s

Fast-mode subagents no longer pass --context files to gptme, removing the 20k+ token workspace context load that was the dominant latency source (~30-60s). Simple lookups now run in ~5-10s instead of ~1 minute. Smart mode is unchanged — it still loads full workspace context for queries that need it. Also streams stdout line-by-line so subagent_status can show the last action the running subagent performed (addresses feedback on #711).

codecov · 2026-04-20T18:36:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-04-20T18:36:08Z

Greptile Summary

This PR improves gptme-voice subagent latency by skipping --context files in fast mode (avoiding 20k+ token workspace loading) and switches from communicate() to line-by-line stdout streaming so subagent_status can surface last_output in real time. The logic change is small and well-targeted; five new tests cover the fast/smart mode CLI differences and the streaming status feature.

Confidence Score: 5/5

Safe to merge; both findings are P2 style/quality suggestions that do not block the feature.

The core change (conditional --context files omission) is simple, correct, and well-tested. The streaming refactor is clean. Remaining comments are a weak test assertion and a zombie-process hardening suggestion — neither causes incorrect behavior on the happy path.

No files require special attention beyond the P2 notes above.

Important Files Changed

Filename	Overview
packages/gptme-voice/src/gptme_voice/realtime/tool_bridge.py	Adds `on_progress` streaming callback and skips `--context files` for fast-mode subagents; minor zombie-process risk on timeout introduced by replacing `communicate()` with the new streaming reader.
packages/gptme-voice/tests/test_tool_bridge.py	Good coverage of new fast/smart mode CLI argument assertions; `test_subagent_status_shows_last_output` has a conditionally-gated assertion that can pass trivially if `last_output` is never populated.

Sequence Diagram

sequenceDiagram
    participant RT as Realtime Voice
    participant TB as GptmeToolBridge
    participant SA as gptme subagent

    RT->>TB: handle_function_call("subagent", {mode: "fast"})
    TB->>TB: assign task_id, create asyncio.Task
    TB-->>RT: {status: "dispatched", task_id}

    TB->>SA: gptme --non-interactive [--model haiku] task
    Note over TB,SA: fast mode: no --context files

    loop streaming stdout
        SA-->>TB: stdout line
        TB->>TB: pending.last_output = line
    end

    RT->>TB: handle_function_call("subagent_status", {})
    TB-->>RT: {pending: [{task_id, last_output, elapsed_seconds}]}

    SA-->>TB: exit 0
    TB->>TB: read response_file
    TB->>RT: on_result(response_text)

Comments Outside Diff (1)

packages/gptme-voice/src/gptme_voice/realtime/tool_bridge.py, line 208-214 (link)

Zombie process left behind on timeout

process.kill() sends SIGKILL but process.wait() is never called after it, so the kernel keeps the child entry as a zombie until the parent Python process exits. The previous communicate() approach would also have this risk when cancelled, but the new streaming path makes it more visible: the finally block only cleans up the response file, not the process.

A safe pattern after kill is to shield a short wait() call so the OS entry is reaped:
```
except asyncio.TimeoutError:
    process.kill()
    with contextlib.suppress(Exception):
        await asyncio.shield(process.wait())
    return ToolResult(
        success=False,
        output="",
        error=f"Subagent timed out after {self.timeout}s",
    )
```
The same applies to the CancelledError branch on line 216.

_{Reviews (1): Last reviewed commit: "perf(gptme-voice): skip context loading ..." | Re-trigger Greptile}

greptile-apps · 2026-04-20T18:36:11Z

+            if entry.get("last_output"):
+                assert "Found 3 active tasks" in entry["last_output"]


Weak assertion makes the test self-defeating

The central claim of this test is that last_output is populated once the subagent produces stdout lines, but the assertion is guarded by if entry.get("last_output"):. If last_output is always absent or empty — the exact regression being tested against — the assert on line 207 is simply never reached and the test passes silently.

Two asyncio.sleep(0) yields may not be enough for the async stdout reader to have fully iterated; if the reader hasn't run yet, last_output is "" and the branch is skipped entirely. Consider promoting this to an unconditional assert:

Suggested change

if entry.get("last_output"):

assert "Found 3 active tasks" in entry["last_output"]

assert entry.get("last_output"), "last_output must be set once subagent produces stdout"

assert "Found 3 active tasks" in entry["last_output"]

ErikBjare · 2026-04-20T18:41:16Z

See my comment here: https://github.com/ErikBjare/bob/issues/651#issuecomment-4283398928

…719) Adds keywords and an anti-pattern section for the failure mode of writing latency-win claims ("should drop to ~5-10s") in PR descriptions or issue comments without a baseline number or cost-model comparison. The existing lesson covered code-level premature optimization but did not trigger on the more common failure of anchoring a projected speedup on one component (prompt size) instead of the dominant cost (turns × tok/s + startup). Motivation: ErikBjare/bob#651 — shipped #713 claiming context-skip would drop voice subagent lookups to 5-10s without showing that prompt-eval was actually the dominant latency source. Correct response (measurement-first) was #718.

greptile-apps Bot reviewed Apr 20, 2026

View reviewed changes

TimeToBuildBob merged commit d153938 into master Apr 20, 2026
14 checks passed

TimeToBuildBob deleted the fix/voice-fast-subagent-latency branch April 20, 2026 18:50

TimeToBuildBob changed the title ~~perf(gptme-voice): skip context loading in fast subagent mode~~ fix(gptme-voice): separate fast and smart subagent models Apr 20, 2026

TimeToBuildBob changed the title ~~fix(gptme-voice): separate fast and smart subagent models~~ perf(gptme-voice): skip context loading in fast subagent mode Apr 20, 2026

This was referenced Apr 20, 2026

fix(gptme-voice): separate fast and smart subagent models #714

Merged

docs(lessons): extend measure-before-optimize for PR/comment claims #719

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(gptme-voice): skip context loading in fast subagent mode#713

perf(gptme-voice): skip context loading in fast subagent mode#713
TimeToBuildBob merged 1 commit intomasterfrom
fix/voice-fast-subagent-latency

TimeToBuildBob commented Apr 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 20, 2026

Uh oh!

greptile-apps Bot commented Apr 20, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 20, 2026

Uh oh!

ErikBjare commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if entry.get("last_output"):
		assert "Found 3 active tasks" in entry["last_output"]

Conversation

TimeToBuildBob commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause of the 1-minute latency

Test plan

Uh oh!

codecov Bot commented Apr 20, 2026

Codecov Report

Uh oh!

greptile-apps Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

ErikBjare commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TimeToBuildBob commented Apr 20, 2026 •

edited

Loading

greptile-apps Bot commented Apr 20, 2026 •

edited

Loading