Skip to content

feat(acp-bridge): cross-client real-time sync completeness (5 fixes)#4484

Open
chiga0 wants to merge 1 commit into
QwenLM:daemon_mode_b_mainfrom
chiga0:feat/daemon-cross-client-sync-completeness
Open

feat(acp-bridge): cross-client real-time sync completeness (5 fixes)#4484
chiga0 wants to merge 1 commit into
QwenLM:daemon_mode_b_mainfrom
chiga0:feat/daemon-cross-client-sync-completeness

Conversation

@chiga0
Copy link
Copy Markdown
Collaborator

@chiga0 chiga0 commented May 24, 2026

Summary

A cross-client real-time sync audit of daemon_mode_b_main (2026-05-24) surfaced eight gaps where one client's actions did not propagate to other SSE-subscribed clients on the same session. This PR closes the five that are bridge-layer mechanical fixes — user_message_chunk echo on the prompt path, prompt_cancelled broadcast on cancel, a replay_complete sentinel for Last-Event-ID resume, and envelope-level originatorClientId on session_metadata_updated + session_closed. The remaining three (in-session ACP setModel bus emit, workspace-wide non-persisted approval-mode broadcast, permission_resolved originator-vs-voter semantics) need larger design alignment and are tracked as separate follow-ups.

What this PR delivers

1. `user_message_chunk` echo on the interactive prompt path

The agent's `Session#executePrompt` forwards the prompt straight to the LLM without emitting `user_message_chunk` to the session bus. The cron path (`Session.ts:1402`) and `HistoryReplayer` (`HistoryReplayer.ts:65`) DO emit it; only the interactive path was the outlier. Result: when client A sent a prompt, other clients on the same session saw only the agent's reply — never the input — until a session reload.

Fix: new `echoPromptToSessionBus` helper in `bridge.ts` publishes one `user_message_chunk` per content block of the incoming `PromptRequest`, stamped with envelope-level `originatorClientId` so SDK consumers with `suppressOwnUserEcho: true` filter the echo on the originator's own UI. Multi-modal blocks pass through verbatim for future-compat with Core's multi-modal echo work (PR #4353 §D).

`_meta.source: 'bridge-echo'` distinguishes bridge-synthesized echoes from agent-emitted content.

2. `prompt_cancelled` broadcast in cancelSession

`bridge.cancelSession` forwarded the ACP cancel notification to the agent and resolved pending permissions, but did NOT publish any event on the session bus. Other clients learned that A had cancelled only by absence of further `agent_message_chunk` frames — heuristic and late.

Fix: emit a `prompt_cancelled` envelope before the ACP forward so peer clients see the cancel as a first-class event. Envelope-level `originatorClientId` identifies the cancelling client.

3. `replay_complete` sentinel in EventBus.subscribe

A consumer attaching via `Last-Event-ID: ` had no positive signal when the replay loop drained — they had to heuristically time out the catch-up spinner. The state-resync path already had a synthetic `state_resync_required` frame; the success path lacked parity.

Fix: emit an id-less `replay_complete` synthetic frame at the end of the replay loop. Fires both when replay actually delivered frames AND when there was nothing to replay (empty ring). `data.replayedCount` is the actual count of force-pushed frames (not derived from id arithmetic, which would over-count when the state-resync path leaves a hole before the ring's earliest id).

4. `originatorClientId` on `session_metadata_updated` envelope

`updateSessionMetadata` resolved the trusted client id for validation but did not stamp it on the broadcast envelope. Sibling events (`model_switched`, `approval_mode_changed`) all stamp envelope-level `originatorClientId`; this brings metadata broadcast to parity.

5. `originatorClientId` on `session_closed` envelope

`session_closed` carried the closing client in `data.closedBy` only, but every other event uses the envelope-level `originatorClientId` field. Added the envelope stamp; kept `data.closedBy` for back-compat.

Out of scope (deferred to follow-ups)

The audit also surfaced 3 items that require larger design discussion. Tracked as separate follow-up PRs:

  • In-session ACP `setModel` bus emit — `Session.ts#setModel` calls `config.switchModel` directly without going through the bridge's publish path. Fixing this requires a new ACP sessionUpdate type (`current_model_update`, parallel to existing `current_mode_update`) or a side-channel callback.
  • Workspace-wide broadcast of non-persisted approval-mode changes — current behavior only broadcasts workspace-wide on `persist=true`; needs design clarity on the persist flag's relationship to multi-client visibility.
  • Serialize `setSessionApprovalMode` — analogous to `entry.modelChangeQueue` for `setSessionModel` to fix the concurrent-toggle race.
  • Reconcile `permission_resolved.originatorClientId` semantics — currently carries the VOTER's clientId; `permission_request` carries the prompt originator. Inconsistent. Either change to consistent semantics across the two events, or add a separate `voterClientId` field.

Deployment coordination note

Downstream products that ship their own gateway with a user-message echo workaround (e.g., `web-terminal-sandboxs` `qwen-gateway`) will produce double frames once this daemon-side echo lands. Recommended rollout: deploy daemon first with the new echo, then flip the gateway's `GATEWAY_ECHO_USER_MESSAGE=false` env flag. The `_meta.source` marker (`'bridge-echo'` from daemon vs `'gateway-echo'` from gateway) lets future SDK-side dedup catch leftover misconfiguration.

Validation

```bash
cd packages/acp-bridge
npx vitest run # 291/291 pass (3 updated for replay_complete)
npx tsc --noEmit # clean
```

Linked

Audit (cross-client sync, 2026-05-24) of the daemon's per-session
EventBus fan-out surfaced gaps where one client's actions did not
propagate to other SSE-subscribed clients on the same session. This
commit closes five of them — all bridge-layer fixes, no agent-side
changes — with regression tests covering the new sentinel frame.

## 1. user_message_chunk echo on the interactive prompt path

The agent's `Session#executePrompt` (Session.ts:556+) forwards the
prompt straight to the LLM without emitting `user_message_chunk` to
the session bus. The cron path (Session.ts:1402) and HistoryReplayer
(HistoryReplayer.ts:65) DO emit it; only the interactive path was the
outlier. Result: when client A sent a prompt, other clients on the
same session saw only the agent's reply, never the input — they had
to wait for a session reload to learn what A had asked.

Fix: `echoPromptToSessionBus` helper publishes one `user_message_chunk`
per content block of the incoming `PromptRequest`, stamped with the
envelope-level `originatorClientId` so SDK consumers with
`suppressOwnUserEcho: true` filter the echo on the originator's UI.
Multi-modal blocks (image / audio / resource) pass through verbatim
for future-compat with Core's multi-modal echo work.

`_meta.source: 'bridge-echo'` distinguishes bridge-synthesized echoes
from agent-emitted content. Used today only for diagnostic visibility;
becomes load-bearing once SDK-side dedup matures (deferred follow-up).

## 2. prompt_cancelled broadcast in cancelSession

`bridge.cancelSession` forwarded the ACP cancel notification to the
agent and resolved pending permissions, but did NOT publish any event
on the session bus. Other clients learned that A had cancelled only
by absence of further `agent_message_chunk` frames — heuristic and
late.

Fix: emit a `prompt_cancelled` envelope before the ACP forward so
peer clients see the cancel as a first-class event. Envelope-level
`originatorClientId` identifies the cancelling client (the one calling
`POST /cancel`). Permission-resolution events generated by the
subsequent `cancelPendingForSession` continue to omit an originator
(those are system-initiated wind-downs, not user-voted).

## 3. replay_complete sentinel in EventBus.subscribe

A consumer attaching via `Last-Event-ID: <n>` had no positive signal
when the replay loop drained — they had to heuristically time out the
catch-up spinner. The state-resync path already had a synthetic
`state_resync_required` frame; the success path lacked parity.

Fix: emit an id-less `replay_complete` synthetic frame at the end of
the replay loop (same pattern as `client_evicted` / `state_resync_required`
— no slot in the per-session monotonic sequence). Fires both when
replay actually delivered frames AND when there was nothing to replay
(empty ring), so the consumer always sees the transition from
"catching up" to "live". `data.replayedCount` is the actual count of
force-pushed frames (not derived from id arithmetic, which would
over-count when the state-resync path leaves a hole before the ring's
earliest id).

3 EventBus test cases updated to assert the sentinel frame ordering.

## 4. originatorClientId on session_metadata_updated envelope

`updateSessionMetadata` resolved the trusted client id for validation
(`resolveTrustedClientId(entry, context.clientId)`) but did not stamp
it on the broadcast envelope. UIs couldn't attribute the rename to a
specific client. Sibling events (`model_switched`, `approval_mode_changed`)
all stamp envelope-level `originatorClientId`; this brings the metadata
broadcast to parity.

## 5. originatorClientId on session_closed envelope

`session_closed` carried the closing client in `data.closedBy` only,
but every other event the bridge publishes uses the envelope-level
`originatorClientId` field. Added the envelope-level stamp (kept
`data.closedBy` for back-compat) so SDK consumers can read the
attribution from the same place across all event types.

## Out-of-scope (deferred to follow-up)

The cross-client sync audit also surfaced 3 items that require larger
design discussion:

- **In-session ACP `setModel` bus emit** — `Session.ts#setModel` calls
  `config.switchModel` directly without going through the bridge's
  publish path. Fixing this requires a new ACP sessionUpdate type
  (`current_model_update`, parallel to existing `current_mode_update`)
  or a side-channel callback from agent to bridge.
- **Workspace-wide broadcast of non-persisted approval-mode changes** —
  current behavior only broadcasts workspace-wide on `persist=true`;
  the design intent of the persist flag relative to multi-client
  visibility needs alignment.
- **Serialize `setSessionApprovalMode` through a queue** — analogous to
  `entry.modelChangeQueue` for `setSessionModel`. Race-condition fix.
- **Reconcile `permission_resolved.originatorClientId` semantics** —
  it currently carries the VOTER's clientId; `permission_request`
  carries the prompt originator. SDK consumers need to special-case
  the type. Either change to consistent semantics or add a separate
  `voterClientId` field.

These are tracked as follow-ups, not in this PR.

## Validation

| | |
|---|---|
| Bridge tests | 291/291 pass |
| eventBus tests | 105/105 pass (3 updated) |
| TypeScript | clean |
@github-actions
Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR addresses five cross-client real-time sync gaps identified in a daemon_mode_b_main audit, implementing mechanical bridge-layer fixes for multi-client SSE propagation. The changes are well-documented, follow existing patterns, and include appropriate test updates. Overall assessment: solid implementation with excellent documentation, though a few edge cases and consistency improvements should be addressed before merging.

🔍 General Feedback

  • Documentation quality is exceptional: Inline comments thoroughly explain the "why" behind each change, reference related PRs (feat(sdk/daemon-ui): unified completeness follow-up to #4328 #4353, fix(core): handle MiMo tool-result media #4281), and document design trade-offs. This is exemplary.
  • Consistent patterns: All five fixes follow established envelope patterns (originatorClientId stamping, try/catch for closed buses, synthetic frame conventions).
  • Test coverage: Tests are updated to reflect the new replay_complete sentinel behavior, though test coverage for the bridge.ts changes appears minimal.
  • Schema version handling: The replay_complete sentinel correctly uses EVENT_SCHEMA_VERSION and follows the id-less synthetic frame pattern.
  • Error handling: Appropriate defensive try/catch around bus publishes, with silent failures being acceptable for torn-down sessions.

🎯 Specific Feedback

🟡 High

  • packages/acp-bridge/src/bridge.ts:1957-1980 - Missing test coverage for echoPromptToSessionBus: The new echoPromptToSessionBus function and its integration at the interactive prompt path lack dedicated test coverage. Given this fixes a critical cross-client sync gap, tests should verify:

    • Multi-block prompts emit one user_message_chunk per block
    • originatorClientId is correctly stamped on the envelope
    • _meta.source: 'bridge-echo' appears on emitted events
    • Non-array/non-object prompt parts are safely skipped
    • Bus closed during publish doesn't throw

    Recommendation: Add tests in a bridge.test.ts file (or existing test file) covering these scenarios before merging.

  • packages/acp-bridge/src/bridge.ts:2076-2097 - prompt_cancelled broadcast timing race: The comment states "Publish BEFORE the ACP cancel forward so other clients learn about the cancel even if the agent is slow to wind down" — however, if the bus publish succeeds but the subsequent entry.connection.cancel() throws, the cancel event has been broadcast but the agent never received the cancellation. This could leave peer clients thinking a prompt was cancelled when the agent is still processing it.

    Recommendation: Consider whether the prompt_cancelled broadcast should only occur after entry.connection.cancel() resolves successfully, or document this edge case as acceptable (the agent will eventually resolve with stopReason: 'cancelled' even if the notification delivery is delayed).

🟢 Medium

  • packages/acp-bridge/src/eventBus.ts:417-447 - replay_complete sentinel ordering with state_resync_required: When a ring eviction occurs (gap detected), the code pushes state_resync_required first, then replay frames, then replay_complete. However, the replay_complete sentinel doesn't indicate whether a resync was required earlier in the stream. Consumers seeing state_resync_required followed by replay_complete may need to correlate these events.

    Suggestion: Consider adding stateResyncRequired: true to the replay_complete data payload when a resync frame was emitted, making it easier for consumers to correlate the recovery episode.

  • packages/acp-bridge/src/bridge.ts:230-290 - echoPromptToSessionBus type safety: The function casts req as { prompt?: unknown } and performs runtime type checks, which is reasonable. However, the type cast suggests PromptRequest may not formally include a prompt field, or the field type is broader than expected.

    Suggestion: Verify PromptRequest type definition includes prompt with the correct union type, or add a type guard helper to avoid repeated casts if this pattern appears elsewhere.

  • packages/acp-bridge/src/eventBus.test.ts:620-635 - Empty-ring test assertion order: The test asserts out[0]?.type is 'replay_complete' and out[1]?.type is 'foo', but the comment mentions "2 total" without explicitly asserting out.length === 2. The if (out.length === 2) break; ensures it, but an explicit length assertion would make the test more robust.

    Suggestion: Add expect(out).toHaveLength(2); before the type assertions for clarity.

🔵 Low

  • packages/acp-bridge/src/bridge.ts:1920-1955 - Comment verbosity trade-off: The inline documentation for echoPromptToSessionBus is thorough (excellent!), but the block comment at lines 1967-1982 repeats much of the same information. Consider whether the call-site comment adds value beyond the function-level JSDoc, or consolidate to reduce redundancy.

  • packages/acp-bridge/src/eventBus.ts:437 - Synthetic frame comment placement: The comment explaining why replay_complete is id-less ("so it doesn't burn a slot in the per-session sequence") is valuable but appears mid-function. Consider moving this design rationale to the top of the subscribe function or a module-level comment about synthetic frames.

  • packages/acp-bridge/src/bridge.ts:2392-2437 - metadataOriginatorClientId naming: The variable name is precise but verbose. Given the pattern established elsewhere (cancelOriginatorClientId, originatorClientId), consider whether originatorClientId alone suffices within the local scope, or keep the descriptive name for clarity.

✅ Highlights

  • Exceptional documentation: The JSDoc and inline comments set a high bar for maintainability. Future contributors will understand the "why" behind these changes.
  • Consistent envelope patterns: All five fixes correctly stamp originatorClientId at the envelope level, maintaining parity with existing events (model_switched, approval_mode_changed, etc.).
  • Defensive error handling: Try/catch around bus publishes with appropriate silent failures for torn-down sessions.
  • Schema discipline: The replay_complete sentinel correctly follows the id-less synthetic frame convention and uses EVENT_SCHEMA_VERSION.
  • Test updates: Existing tests are properly updated to account for the new sentinel behavior, with clear comments explaining the expected event counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant