Skip to content

feat(serve+sdk): F4 prereq — daemon protocol completion (serverTimestamp / provenance / errorKind / state_resync_required)#4360

Merged
doudouOUC merged 6 commits into
daemon_mode_b_mainfrom
feat/f4-prereq-protocol-completion
May 21, 2026
Merged

feat(serve+sdk): F4 prereq — daemon protocol completion (serverTimestamp / provenance / errorKind / state_resync_required)#4360
doudouOUC merged 6 commits into
daemon_mode_b_mainfrom
feat/f4-prereq-protocol-completion

Conversation

@doudouOUC
Copy link
Copy Markdown
Collaborator

Summary

Two F4 prerequisites bundled into one PR against daemon_mode_b_main. Both are protocol-level daemon completions that the F4 client-adapter wave needs in place before it can render correctly:

  1. Error after selecting authentication method: "Cannot read properties of undefined (reading 'value')" #19 stamping (14637cd79) — Three daemon-side stamping additions chiga0 called out in proposal(serve): Mode B feature-priority roadmap toward v0.16 production-ready #4175 comment Error after selecting authentication method: "Cannot read properties of undefined (reading 'value')" #19. SDK side (PR feat(sdk/daemon-ui): unified completeness follow-up to #4328 (PR-A through PR-E) #4353) is forward-compat ready; we just need daemon to start emitting these fields. UI affordances are inert without them.

  2. 运行不了 #15 SSE reducer gap detection (c1a2f0a78) — Closes the multi-client state divergence bug Ilya0527 raised in proposal(serve): Mode B feature-priority roadmap toward v0.16 production-ready #4175 comment 运行不了 #15. When an SSE consumer reconnects with Last-Event-ID past the ring eviction point, daemon now emits state_resync_required so the SDK reducer can refuse to apply stale deltas until consumer recovery.

Why bundle: both are small (~80 + ~150 LOC), both touch eventBus / SSE envelope area, both are F4 prerequisites. Single review unit is more efficient than two parallel PRs. Each ships as its own commit so review can step through them independently.

Commit 1 — 14637cd79 (#19 stamping)

Files

  • packages/cli/src/serve/server.tsformatSseFrame() stamps _meta.serverTimestamp at the SSE write boundary. Preserves pre-existing _meta keys (e.g. tool_call's _meta.toolName) via spread merge.
  • packages/cli/src/serve/server.tsstream_error emit stamps errorKind via mapDomainErrorToErrorKind(err) when classifiable. undefined for unclassified errors → field omitted (strictly additive).
  • packages/cli/src/acp-integration/session/emitters/ToolCallEmitter.ts — new static resolveToolProvenance(toolName, subagentMeta){ provenance: 'builtin' | 'mcp' | 'subagent'; serverId? }. Stamped on emitStart AND emitResult AND emitError (reconnecting clients can re-derive provenance from any update event).

Design decisions (confirmed via AskUserQuestion before coding)

# Choice Why
1 serverTimestamp stamped at formatSseFrame boundary SSE-only; doesn't pollute in-memory BridgeEvent type. SDK reads via _meta.serverTimestamp (Anthropic convention).
2 Provenance via mcp__<server>__<tool> naming heuristic Matches core MCP tool naming + SDK's own fallback in PR #4353. No new ctx-dep on emit hot path.

NOT stamped: session_died errorKind

The 3 session_died emit sites in acp-bridge/bridge.ts don't have a classifiable err in scope (channel_closed carries exitCode/signalCode only; killed + daemon_shutdown are user/operator-initiated, not domain errors). Threading channel-spawn errors through to enable errorKind: 'init_timeout' / missing_binary classification is left for a separate PR to avoid mixing protocol stamping with lifecycle plumbing.

Commit 2 — c1a2f0a78 (#15 gap detection)

Daemon side (packages/acp-bridge/src/eventBus.ts)

In subscribe()'s replay path, detect ring eviction by comparing this.ring[0]?.id against lastEventId + 1. When earliest > last + 1, force-push a synthetic terminal frame:

{ v: 1, type: 'state_resync_required',
  data: { reason: 'ring_evicted',
          lastDeliveredId: N,
          earliestAvailableId: M } }

The frame has NO id (mirrors client_evicted synthetic terminal pattern). Replay continues after it — adapters get the option to compute a "what you missed" diff later; the SDK reducer auto-skips stale deltas regardless.

SDK side (packages/sdk-typescript/src/daemon/events.ts)

Adds the event type to DAEMON_EVENT_TYPES, the data interface + event type + predicate, widens DaemonStreamLifecycleEvent, adds reducer state fields (awaitingResync / resyncRequiredCount / lastResyncRequired?), and gates non-terminal events at the top of the reducer when awaitingResync === true. The RESYNC_PASSTHROUGH_TYPES set documents which event types still apply (terminal lifecycle frames must, so a session that dies during resync limbo is observable).

Design decisions (confirmed via AskUserQuestion before coding)

# Choice Why
3 Continue replay after state_resync_required Network-friendly (no reconnect). SDK has option to compute diff later.
4 SDK reducer auto-skip delta when awaitingResync Safer default. UI doesn't render against known-stale state. Consumer recovers explicitly via loadSession + createDaemonSessionViewState.

Consumer recovery contract

When state.awaitingResync === true, consumer code calls loadSession (out of band — the SDK doesn't see that as an event), then reconstructs view state via createDaemonSessionViewState({...seedFromLoadSession}). The fresh state defaults awaitingResync: false so seeding implicitly clears the flag.

Test plan

  • npx vitest run packages/cli/src/serve/server.test.ts — +3 new tests (serverTimestamp + stream_error errorKind)
  • npx vitest run packages/cli/src/acp-integration/session/emitters/ToolCallEmitter.test.ts — 46 tests pass (+11 new: resolveToolProvenance static + provenance stamping on all 3 emit paths)
  • npx vitest run packages/cli/src/acp-integration/session/HistoryReplayer.test.ts — 17 tests pass (provenance flows through replay)
  • cd packages/acp-bridge && npx vitest run — 6 files, 108/108 pass (+5 new resync detection tests)
  • cd packages/sdk-typescript && npx vitest run — 13 files, 451/451 pass (+8 new reducer resync tests)
  • Pre-commit hook (prettier + eslint --max-warnings 0) clean on all 11 modified files
  • CI green across Linux / macOS / Windows
  • Smoke: qwen serve + curl SSE → verify _meta.serverTimestamp on every frame; trigger an error → verify errorKind; force ring eviction → verify state_resync_required frame

Backward compat

  • All daemon stampings are additive. Old SDKs ignore _meta.serverTimestamp / provenance / serverId / errorKind fields. New SDKs read them.
  • state_resync_required is a NEW event type; daemons predating this PR never emit it. SDK reducer's switch fall-through ignores unknown types (per existing unrecognizedKnownEventCount semantics).
  • DaemonStreamErrorData.errorKind? is optional — old daemons that don't stamp it remain valid.
  • DaemonSessionViewState widened with awaitingResync / resyncRequiredCount / lastResyncRequired? — consumer reseeding via createDaemonSessionViewState() (no args) defaults awaitingResync: false, so adapters that never saw a resync event behave identically.

Refs

🤖 Generated with Qwen Code

doudouOUC added 2 commits May 20, 2026 20:29
…aemon events (#4175 F4 prereq)

Adopts chiga0's three P0 SDK-side blockers from #4175 comment #19 —
the SDK side already consumes these fields (PR #4353), but daemon
hadn't stamped them yet, leaving the corresponding UI affordances
inert. All three stampings are purely additive on the wire and don't
require any SDK type changes (SDK already has forward-compat field
slots).

**#19.1 — `_meta.serverTimestamp` on every SSE frame** (`server.ts`
`formatSseFrame()`)

Stamped at the SSE write boundary (NOT EventBus.publish) so the
in-memory `BridgeEvent` type stays unchanged and internal consumers
don't see `_meta`. Pre-existing `_meta` keys (e.g. tool_call's
`_meta.toolName`) are preserved via spread merge. SDK reads via the
3-location probe in `extractServerTimestamp` (chiga0's PR #4353);
we pick `_meta.serverTimestamp` (Anthropic convention) so top-level
event type stays unpolluted.

Why this matters: pre-fix, multi-client UIs showing "X minutes ago"
or sorting transcript blocks by emit time used each client's local
clock — drifts of tens of seconds to minutes across browsers/tabs/
mobile produced visibly inconsistent timestamps.

**#19.2 — `tool_call` `provenance` + `serverId` on every emitter
event** (`ToolCallEmitter.ts`)

New static `ToolCallEmitter.resolveToolProvenance(toolName,
subagentMeta)` returns `{ provenance: 'builtin' | 'mcp' | 'subagent';
serverId? }`. Resolution rules (per user-confirmed design decision
from issue comment): subagent takes precedence (set when
subagentMeta is present); `mcp__<server>__<tool>` naming heuristic
classifies MCP tools with serverId; everything else is builtin.

Stamped on `emitStart` AND `emitResult` AND `emitError` (all three
emit paths) so a reconnecting client receiving a `tool_call_update`
frame from the replay ring (without the original `tool_call` start
event) can still derive the provenance. Provenance is stable per
tool, so stamping on every event is redundant — but the marginal
serialization cost is tiny and reconnect correctness wins.

Chose the naming heuristic (not ToolRegistry lookup) per user
confirmation: matches the SDK's own fallback (chiga0 PR #4353), no
new ctx-dep on emit hot path, no signature changes.

**#19.3 — `errorKind` on `stream_error`** (`server.ts` line ~1955)

Stamped via `mapDomainErrorToErrorKind(err)` — the 7-value classifier
already lives in `@qwen-code/acp-bridge/status.ts` since #4319. When
the classifier returns `undefined` (generic Error etc.) the field is
omitted — strictly additive. SDK consumers handle "errorKind absent"
as before (fall back to rendering `error` text).

NOT stamped on `session_died` because the 3 emit sites in `acp-bridge/
bridge.ts` don't have a classifiable `err` in scope:
- `channel_closed` carries only exitCode/signalCode (no error)
- `killed` is user-initiated (no domain error)
- `daemon_shutdown` is operator-initiated (no domain error)

A follow-up could thread channel-spawn errors through to the
session_died emit site to enable `errorKind: 'init_timeout'` /
`missing_binary` classification — left for a separate PR to avoid
mixing protocol stamping with lifecycle plumbing.

Verification
- `npx vitest run packages/cli/src/serve/server.test.ts -t "serverTimestamp|stream_error|errorKind"` — 5 pass
- `npx vitest run packages/cli/src/acp-integration/session/emitters/ToolCallEmitter.test.ts` — 46 pass (+ 11 new tests for resolveToolProvenance + provenance stamping on all 3 emit paths)
- `npx vitest run packages/cli/src/acp-integration/session/HistoryReplayer.test.ts` — 17 pass
- TypeScript clean on touched regions; pre-existing F3 (#4335 merge) errors elsewhere are unrelated.

Existing test updates
- 15 `_meta: { toolName: 'X' }` assertions in ToolCallEmitter.test.ts updated to include `provenance: 'builtin'` (defensive — catches accidental drift if a future refactor stops stamping). 2 strict-equality assertions in HistoryReplayer.test.ts similarly updated. The first SSE-frame test in server.test.ts switched from `toEqual` to `toMatchObject` since `_meta.serverTimestamp` makes exact equality brittle; a dedicated test pins the new field's shape.
…ync_required (#4175 F4 prereq)

Closes the multi-client SSE reducer divergence bug Ilya0527 raised in
#4175 comment #15. Pre-fix scenario:

1. Consumer's SSE stream drops; client buffers `Last-Event-ID: N`.
2. Network reconnects long enough later that events `[N+1, ringHead-1]`
   were evicted from the daemon's per-session ring.
3. Daemon's `subscribe({lastEventId: N})` silently replays only the
   surviving suffix.
4. Consumer's SDK reducer keeps applying deltas as if the stream was
   contiguous. Its state has now drifted from the daemon's truth —
   no terminal signal, no warning. The `SessionState` reducer's
   "same event stream in → same state out" purity guarantee is
   broken.

The bug's blast radius is exactly when multi-client matters: F4
brings up the TUI / IDE / web client adapters that share session
state, so divergence becomes visibly inconsistent across clients.

**Daemon side** (`packages/acp-bridge/src/eventBus.ts`)

In `subscribe()`'s replay path, detect ring eviction by comparing
the ring's earliest id against `lastEventId + 1`. When a gap exists,
force-push a synthetic terminal `state_resync_required` frame BEFORE
the surviving replay events:

```
{ v: 1, type: 'state_resync_required',
  data: { reason: 'ring_evicted',
          lastDeliveredId: N,
          earliestAvailableId: M } }
```

Per user-confirmed design (issue comment discussion): the frame has
NO `id` (mirrors the `client_evicted` synthetic terminal pattern so
it doesn't burn a slot in the per-session monotonic sequence). Replay
continues after the resync frame — the SDK reducer auto-skips
subsequent deltas (see below) but the frames stay on the wire so
adapters have the option to compute a "what you missed" diff later.

**SDK side** (`packages/sdk-typescript/src/daemon/events.ts`)

Adds:
- `'state_resync_required'` to `DAEMON_EVENT_TYPES` union
- `DaemonStateResyncRequiredData` + `DaemonStateResyncRequiredEvent`
- `isStateResyncRequiredData` predicate
- `DaemonStreamLifecycleEvent` union widened
- Reducer state fields: `awaitingResync: boolean`,
  `resyncRequiredCount: number`, `lastResyncRequired?`
- Reducer case for `state_resync_required` — sets the flag, increments
  count, records data
- **Top-of-reducer gate**: when `awaitingResync === true`, all non-
  terminal events are auto-skipped (still advance `lastEventId`).
  Terminal lifecycle events (`session_died` / `session_closed` /
  `client_evicted` / `stream_error`) STILL apply — critical end-of-
  stream signals don't depend on prior state being current.
- Re-exported `DaemonStateResyncRequiredData` / Event from
  `daemon/index.ts` and `src/index.ts` (matches surface posture of
  sibling lifecycle types).

Consumer recovery contract: when `state.awaitingResync === true`,
call `loadSession` (out of band) to fetch the daemon's canonical
session snapshot, then reconstruct view state via
`createDaemonSessionViewState({...seed from loaded state})`. The
fresh state defaults `awaitingResync: false` so the seed implicitly
clears the flag.

**Side fix** (`stream_error` errorKind)

`DaemonStreamErrorData.errorKind?: string` typed for the optional
classification field that Commit 1 (`14637cd79`) added daemon-side.
Strictly additive — old daemons omit the field, SDK falls back to
rendering `error` text.

Verification
- `packages/acp-bridge`: 6 files, 108/108 pass (+5 new resync-detection
  tests; 1 existing "default ring size 8000" test updated to acknowledge
  the synthetic resync frame at the head of its replay batch).
- `packages/sdk-typescript`: 13 files, 451/451 pass (+8 new reducer
  resync tests covering set/skip/terminal-passthrough/recovery/
  repeated-resync/malformed-payload).
- TypeScript clean across both packages on touched regions.
@github-actions
Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR bundles two F4 prerequisites: (1) protocol-level stamping additions for SSE events (server timestamp, error classification, tool provenance), and (2) gap detection for SSE reconnects past the ring eviction point. Both are well-implemented with comprehensive test coverage. The changes are strictly additive and backward-compatible. Overall assessment: strong implementation with excellent documentation, though a few edge cases and naming considerations warrant attention before merge.

🔍 General Feedback

  • Excellent documentation: Inline comments thoroughly explain design decisions, reference issue numbers (proposal(serve): Mode B feature-priority roadmap toward v0.16 production-ready #4175), and document the "why" behind each change. This is exemplary.
  • Clean separation of concerns: Each commit addresses one concern independently, making review easier despite the bundle.
  • Comprehensive test coverage: Both commits include extensive unit tests covering normal paths, edge cases, and boundary conditions.
  • Backward compatibility: All changes are additive—new fields are optional, and existing behavior is preserved.
  • Type safety: SDK TypeScript changes maintain strict typing with proper type guards and validation functions.
  • Network-friendly design: The gap detection replay-continues-after-resync pattern avoids unnecessary reconnections.

🎯 Specific Feedback

🟡 High Priority

  1. packages/cli/src/acp-integration/session/emitters/ToolCallEmitter.ts - The resolveToolProvenance method is referenced in tests but not implemented in the source file. The tests call ToolCallEmitter.resolveToolProvenance() as a static method, but the current implementation only has instance methods. This appears to be a missing implementation that would cause test failures.

    • Recommendation: Add the static resolveToolProvenance(toolName, subagentMeta?) method to classify tool provenance as 'builtin' | 'mcp' | 'subagent' with the serverId extraction logic for MCP tools.
  2. packages/cli/src/serve/server.ts:2553-2571 - The _meta.serverTimestamp stamping happens at the SSE wire boundary via spread merge, but there's a potential type safety issue: BridgeEvent type doesn't include _meta in its definition, so the cast (event as { _meta?: Record<string, unknown> })._meta bypasses type checking.

    • Recommendation: Consider adding _meta?: Record<string, unknown> to the BridgeEvent type definition or create a typed interface for stamped events to maintain end-to-end type safety.

🟢 Medium Priority

  1. packages/acp-bridge/src/eventBus.ts:357-402 - The gap detection logic compares earliestInRing > opts.lastEventId + 1, but the check doesn't account for the case where opts.lastEventId equals Number.MAX_SAFE_INTEGER (integer overflow edge case). While extremely rare in practice, this could cause incorrect gap detection.

    • Recommendation: Add a guard: opts.lastEventId < Number.MAX_SAFE_INTEGER to the condition, or document this as a known limitation.
  2. packages/sdk-typescript/src/daemon/events.ts:1101-1114 - The auto-skip gate for awaitingResync advances lastEventId via advanceLastEventId(base) implicitly, but this isn't immediately clear from the code. The comment explains it, but the actual advancement happens in the base spread at the end of each case block.

    • Recommendation: Consider extracting the skip logic into a helper function with a clearer comment: // Skip delta application but preserve lastEventId advancement from base state.
  3. packages/sdk-typescript/src/daemon/events.ts:864-884 - The RESYNC_PASSTHROUGH_TYPES set includes terminal events but the comment lists session_died, session_closed, client_evicted, stream_error. However, session_closed is not in the actual set—only 4 types are listed. This is a documentation/code mismatch.

    • Recommendation: Either add session_closed to the set if it should pass through, or remove it from the comment if intentional.

🔵 Low Priority

  1. packages/cli/src/serve/server.ts:1944-1965 - The errorKind stamping uses ...(errorKind ? { errorKind } : {}) which is correct, but the pattern could be simplified to errorKind (short-hand) if the field is always defined when classified. The current approach is defensive but slightly verbose.

    • Recommendation: Consider defining errorKind as string | undefined in the type and letting JSON.stringify omit undefined values naturally (requires JSON.stringify replacer or type change).
  2. packages/cli/src/acp-integration/session/emitters/ToolCallEmitter.test.ts:700-800 - The test suite for resolveToolProvenance is comprehensive, but the test naming convention could be more consistent. Some tests use "classifies" while others use "stamps".

    • Suggestion: Standardize test descriptions to follow the pattern: "classifies [input] as [provenance] when [condition]".
  3. packages/sdk-typescript/src/daemon/events.ts:23-35 - The comment for state_resync_required mentions "Synthetic (no id) so it doesn't burn a slot in the per-session monotonic sequence." This is a critical design decision that deserves a code comment at the emission site in eventBus.ts as well, not just the SDK side.

    • Suggestion: Add a cross-reference comment in eventBus.ts pointing to this SDK comment for bidirectional traceability.
  4. packages/sdk-typescript/test/unit/daemonEvents.test.ts:2388-2396 - The test "rejects malformed state_resync_required payload via unrecognizedKnownEventCount" is excellent, but it would benefit from also testing the case where reason is empty string (which isNonEmptyString should reject).

    • Suggestion: Add a test case with reason: '' to verify the validation handles empty strings correctly.

✅ Highlights

  • Gap detection algorithm (packages/acp-bridge/src/eventBus.ts:380-397): The ring eviction detection is elegantly simple—comparing earliest-in-ring against lastEventId+1—and the synthetic frame pattern mirrors existing client_evicted semantics. This is a clean solution to a subtle multi-client state divergence bug.
  • Provenance classification heuristic (packages/cli/src/acp-integration/session/emitters/ToolCallEmitter.test.ts:706-780): The mcp__<server>__<tool> naming convention with subagent override is a pragmatic approach that requires no new context dependencies on the emit hot path.
  • Reducer auto-skip logic (packages/sdk-typescript/src/daemon/events.ts:1101-1114): The gate that skips non-terminal deltas while awaitingResync is true, but still advances lastEventId, is precisely the right trade-off between correctness and recovery flexibility.
  • Test coverage for boundary conditions: The tests for exact boundary cases (lastEventId === earliest - 1), empty rings, and fresh subscribes demonstrate thorough thinking about edge cases.
  • Commit organization: Each feature is a self-contained commit with clear boundaries, making rollback or cherry-pick straightforward if needed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes two daemon protocol prerequisites needed for the F4 client-adapter wave: (1) daemon-side “stamping” of additional metadata (_meta.serverTimestamp, tool-call provenance, and errorKind on stream_error) and (2) SSE replay gap detection that emits a new state_resync_required event when a client resumes past the EventBus ring eviction boundary.

Changes:

  • Daemon now stamps _meta.serverTimestamp on SSE frames, stamps errorKind on stream_error when classifiable, and stamps tool-call provenance (builtin/mcp/subagent + optional serverId) in tool call events.
  • EventBus subscribe() detects ring-eviction gaps on resume and force-pushes a synthetic state_resync_required frame before replay frames.
  • SDK adds the new state_resync_required event type and reducer behavior (awaitingResync) to skip non-terminal deltas until the consumer reseeds via loadSession.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/acp-bridge/src/eventBus.ts Emit state_resync_required when Last-Event-ID is behind the ring head due to eviction.
packages/acp-bridge/src/eventBus.test.ts Adds/updates unit tests for resync gap detection and ordering vs replay frames.
packages/cli/src/serve/server.ts Stamps _meta.serverTimestamp at SSE write boundary; stamps errorKind on stream_error.
packages/cli/src/serve/server.test.ts Updates SSE tests to account for _meta.serverTimestamp and validates errorKind stamping.
packages/cli/src/acp-integration/session/emitters/ToolCallEmitter.ts Adds provenance stamping and resolveToolProvenance() helper for tool calls.
packages/cli/src/acp-integration/session/emitters/ToolCallEmitter.test.ts Adds tests for provenance resolution and stamping across emit paths.
packages/cli/src/acp-integration/session/HistoryReplayer.test.ts Updates replay expectations to include provenance in _meta.
packages/sdk-typescript/src/daemon/events.ts Adds event typing + reducer state for state_resync_required and resync gating behavior.
packages/sdk-typescript/src/daemon/index.ts Re-exports the new resync event/data types.
packages/sdk-typescript/src/index.ts Re-exports the new resync event/data types from the SDK root.
packages/sdk-typescript/test/unit/daemonEvents.test.ts Adds reducer tests covering resync state, skip gating, and terminal passthrough.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/sdk-typescript/src/daemon/events.ts Outdated
Comment thread packages/cli/src/acp-integration/session/emitters/ToolCallEmitter.ts Outdated
Comment thread packages/cli/src/serve/server.test.ts Outdated
…x round 2 fold-in)

Adopts Codex review round 2 P2 finding on PR #4360 — fold-in to the
F4 prereq scope per user's "a" decision.

**Problem**: When the `BridgeFileSystem` adapter (introduced in
#4334 fs adapter wiring) throws a structured `FsError` (e.g.
`kind: 'untrusted_workspace'` / `kind: 'symlink_escape'` / `kind:
'file_too_large'`), the `@agentclientprotocol/sdk` default RPC
error serialization only sends `error.message` as JSON-RPC -32603
"Internal error". The structured `kind` / `status` / `hint` fields
on FsError are stripped on the way to the agent.

Downstream impact: SDK consumers receiving the ACP error payload
lose the typed discriminator and have to regex-match the human-
readable message to dispatch UI (auth retry vs file picker vs
proxy hint). This silently regresses what the FsError-typed
contract was supposed to provide.

**Fix**: At the bridge boundary (`BridgeClient.writeTextFile` and
`BridgeClient.readTextFile`), catch errors from `this.fileSystem.
writeText/readText` calls. Duck-type check for FsError shape
(`err.name === 'FsError'` + `typeof err.kind === 'string'`); when
matched, rethrow as ACP `RequestError(-32603, message, {errorKind,
hint, status})`. The agent's RPC client now receives `data.
errorKind` and can branch on the closed-enum kind.

Cross-package note: FsError lives in `cli/src/serve/fs/errors.ts`
and acp-bridge can't `import { FsError }` from cli (dependency
inversion). Same duck-typing pattern that `mapDomainErrorToErrorKind`
(status.ts) already applies to `TrustGateError` / `SkillError` for
the same cross-package bundling reason — `instanceof` would fail
across package boundaries when bundlers don't dedupe.

**Code shape**

```typescript
function isFsErrorShape(err: unknown): err is FsErrorShape {
  return (
    err instanceof Error &&
    err.name === 'FsError' &&
    typeof (err as { kind?: unknown }).kind === 'string'
  );
}

function preserveFsErrorOverAcp(err: unknown): never {
  if (isFsErrorShape(err)) {
    throw new RequestError(-32603, err.message, {
      errorKind: err.kind,
      ...(err.hint !== undefined ? { hint: err.hint } : {}),
      ...(err.status !== undefined ? { status: err.status } : {}),
    });
  }
  throw err;
}
```

Applied at both `if (this.fileSystem) { ... }` blocks (writeTextFile
+ readTextFile) — wrapped the adapter call in try/catch +
`preserveFsErrorOverAcp(err)`. Non-FsError errors are rethrown
unchanged (default ACP serialization is fine for unstructured
errors; only the structured shape needs preservation).

JSON-RPC code stays at -32603 (internal error) rather than mapping
FsError.kind → JSON-RPC code. Rationale: the JSON-RPC standard
defines only a handful of code values (-32700/-32600/-32601/-32602/
-32603 + a reserved range for application errors), and mapping
~10 FsError kinds to that narrow space is lossy. Instead the
structured `data.errorKind` carries the semantic information SDK
consumers need; JSON-RPC code remains the generic "an error happened"
signal.

**Tests** (+5 in `bridgeClient.test.ts`)

- writeTextFile FsError → ACP RequestError with errorKind in data
- readTextFile FsError preserving symlink_escape kind (no hint
  field present → not stamped, spread guard works)
- non-FsError pass-through (plain Error stays plain Error, no
  RequestError wrap)
- hint field preservation when present
- defensive: error with `kind` field but wrong `name` does NOT get
  wrapped (e.g. PermissionForbiddenError happens to have a kind
  field internally — must NOT be confused for FsError)

Verification: 113/113 acp-bridge tests pass (+5 new FsError-
preservation tests). Full serve suite shows pre-existing F3-related
failures unrelated to this change (verified in isolation).
Comment thread packages/cli/src/serve/server.ts
Comment thread packages/acp-bridge/src/eventBus.ts Outdated
Comment thread packages/sdk-typescript/test/unit/daemonEvents.test.ts
Comment thread packages/acp-bridge/src/bridgeClient.test.ts
…stion)

Adopts all 7 review threads from the first wenshao + Copilot review
round on PR #4360. All technical fixes (no judgment calls).

**[Critical] BridgeTimeoutError constructor blocks tsc** (wenshao
PRRT_kwDOPB-92c6DfcRI)

`server.test.ts:4670` called `new BridgeTimeoutError('initialize
timed out')` but the constructor signature is `(label: string,
timeoutMs: number)` — TS2554 blocked `tsc --noEmit` and `npm run
build`. Fixed to `new BridgeTimeoutError('initialize', 5000)` per
suggested fix; resulting message `"HttpAcpBridge initialize timed
out after 5000ms"` still satisfies the existing
`.toContain('timed out')` assertion.

**[Suggestion] Copilot JSDoc package name** (Copilot
PRRT_kwDOPB-92c6De-Sm, ToolCallEmitter.ts:210)

JSDoc referenced `@qwen-code/core/mcp-tool` but the actual package
is `@qwen-code/qwen-code-core` with the file at
`packages/core/src/tools/mcp-tool.ts`. Updated the reference.

**[Suggestion] Copilot errorKind type widening** (Copilot
PRRT_kwDOPB-92c6De-Ro, events.ts:244)

`DaemonStreamErrorData.errorKind` was typed as `string` and the
JSDoc said "7-value" closed enum — but `DAEMON_ERROR_KINDS` actually
has 8 values, and `SERVE_ERROR_KINDS` (daemon-side) has 9 (adds
`stat_failed`). Typed as `DaemonErrorKind | (string & {})` for
forward-compat: SDK consumers get IDE autocomplete on the known 8
kinds while still accepting future daemon-side additions (like
`stat_failed`) without a type error. Updated JSDoc to accurately
list 8 current values + call out the forward-compat widening.

Side observation (NOT in scope of this PR): `DAEMON_ERROR_KINDS`
(SDK) lacks `stat_failed` that exists in `SERVE_ERROR_KINDS`
(daemon). That's a separate drift fix.

**[Suggestion] TERMINAL wording misleading** (wenshao
PRRT_kwDOPB-92c6Dj-JL, eventBus.ts:369)

Comment called `state_resync_required` a "TERMINAL synthetic frame"
but it's emitted FIRST (before replay) and the stream stays OPEN.
Genuine terminals like `client_evicted` close the stream after the
frame. Rewrote the comment per suggestion: "id-less synthetic
frame... Unlike `client_evicted`, the stream stays OPEN" — so an
oncall reading the source at 3am gets the right mental model.

**[Suggestion] `_meta` merge dead code + stale reference** (wenshao
PRRT_kwDOPB-92c6Dj-JF, server.ts:2569)

The `existingMeta` merge reads `event._meta` at BridgeEvent top
level, but ToolCallEmitter's `_meta` lives nested inside
`event.data._meta` (publish path goes through `events.publish({type:
'session_update', data: params})`). In production `existingMeta` is
always undefined — the merge is a forward-compat escape hatch, not
an active merge. Also the comment referenced
`extractServerTimestamp` (sdk-typescript) which grep confirms
doesn't exist yet (it's planned in chiga0 PR #4353).

Rewrote the comment block to (1) acknowledge no current producer
sets `_meta` at the top level — it's a forward-compat hook for
future envelope-level metadata; (2) drop the stale
`extractServerTimestamp` reference and instead note that chiga0
PR #4353 plans the 3-location probe. Code shape unchanged
(forward-compat spread stays).

**[Suggestion] session_closed + client_evicted passthrough tests**
(wenshao PRRT_kwDOPB-92c6Dj-JW, daemonEvents.test.ts:2284)

`RESYNC_PASSTHROUGH_TYPES` has 5 members but only `session_died`
and `stream_error` had passthrough tests. Added two missing tests:
`session_closed` and `client_evicted` while awaitingResync.
Critical because if a future refactor accidentally drops either
from the set, a consumer in resync limbo would silently swallow
the terminal signal and the UI would hang on "loading resync
state…".

**[Suggestion] readTextFile non-FsError passthrough test** (wenshao
PRRT_kwDOPB-92c6Dj-JX, bridgeClient.test.ts:251)

The non-FsError pass-through test only covered `writeTextFile`.
Added a symmetric `readTextFile` test — the two `try/catch` blocks
in `bridgeClient.ts` are independent, so test parity guards against
divergent refactors (e.g. someone adding wrapping on one side but
not the other).

Verification
- `packages/acp-bridge`: 6 files, 114/114 pass (+1 new readTextFile
  non-FsError test).
- `packages/sdk-typescript`: 75/75 pass on daemonEvents.test.ts
  (+2 new session_closed / client_evicted passthrough tests).
- `packages/cli/src/serve/server.test.ts`: 248 tests pass on
  touched cases (5 SSE / serverTimestamp / stream_error tests).
  Pre-existing F3 (#4335 merge) test failures unrelated to this
  PR's changes — verified by stash-test-restore on clean tree.
- TypeScript clean on touched regions; `BridgeTimeoutError`
  2-arg fix unblocks `tsc --noEmit` for the test file.
@doudouOUC doudouOUC requested a review from wenshao May 20, 2026 23:58
Comment thread packages/acp-bridge/src/eventBus.ts
Comment thread packages/cli/src/serve/server.ts
Comment thread packages/sdk-typescript/test/unit/daemonEvents.test.ts
Adopts all 3 threads from wenshao's second review round on PR #4360.
All Suggestion-level — daemon-side observability + 1 missing SDK
reducer test.

**[Suggestion] SSE ring eviction silently emits state_resync_required**
(PRRT_kwDOPB-92c6Dp_Uk, eventBus.ts:394)

Pre-fix: when a consumer reconnects past the ring boundary, the
daemon emits `state_resync_required` with zero stderr breadcrumb.
A 3am oncall chasing "my UI is frozen with stale state" couldn't
grep daemon logs to distinguish (a) ring undersized, (b) client
reconnecting too slowly, (c) network partition causing repeated
reconnects.

Fix: detect `next.value.type === 'state_resync_required'` in the
SSE handler's iter loop in `server.ts` and emit a `writeStderrLine`
with the gap details (`lastEventId`, `earliestInRing`, computed
`gap` count, `reason`). Logged at the route boundary rather than
inside `EventBus.subscribe` to keep the bus implementation pure +
concentrate daemon-side observability in the route handler that
already logs socket errors + heartbeats.

**[Suggestion] Bridge iterator throw forwarded to client but not
logged daemon-side** (PRRT_kwDOPB-92c6Dp_Uo, server.ts:1956)

Pre-fix inconsistency: the adjacent `res.on('error', ...)` handler
at line ~1925 logs SSE socket errors with `writeStderrLine`, but
the bridge-iterator-catch block at line ~1940-1965 sends a
`stream_error` SSE frame to the client AND swallows the error
daemon-side. When the bridge iterator throws (subprocess crash,
channel protocol error, unhandled rejection), distinguishing
"subprocess OOM-killed" from "protocol bug" required attaching a
debugger.

Fix: mirror the adjacent handler's pattern — add `writeStderrLine`
before the `stream_error` SSE frame send, including the classified
`errorKind` (when available) in brackets so operators can grep for
`[init_timeout]` / `[missing_binary]` etc.

**[Suggestion] No SDK reducer test verifying stream_error.errorKind
flowthrough** (PRRT_kwDOPB-92c6Dp_Uq, daemonEvents.test.ts:2331)

The daemon-side wire format is tested in `server.test.ts`
(`parsed.data.errorKind === 'init_timeout'`) and
`DaemonStreamErrorData` now declares `errorKind?`, but the SDK
reducer test suite never fed a `stream_error` event with
`errorKind` and asserted `state.streamError?.errorKind`. A future
refactor stripping `errorKind` from the reducer's data assignment
(e.g. spreading only `{error}`) would silently regress without
test signal.

Fix: added `captures errorKind on stream_error in view state` test
exercising the full pipeline — reducer receives stream_error with
errorKind, view state's `streamError.errorKind` matches.

Verification
- `packages/sdk-typescript`: 76/76 daemonEvents tests pass (+1
  new flowthrough test).
- `packages/cli/src/serve/server.test.ts`: 6 targeted serverTimestamp
  / stream_error / errorKind tests pass — server.ts changes are
  observability-only (no behavior change to wire format).
- Pre-existing F3 (#4335 merge) test failures elsewhere are
  unrelated to this PR's changes.
Comment thread packages/cli/src/serve/server.ts
Comment thread packages/cli/src/serve/server.ts
…verage)

Adopts both threads from wenshao's third review round on PR #4360.
Both Suggestion-level — pin the daemon-side stderr log artifacts that
commit `dce2fed0f` introduced. Pre-fix: the EventBus-level
state_resync_required emission was tested in eventBus.test.ts, and
the SSE wire shape was tested in server.test.ts, but the actual
operator-facing artifacts (the stderr log lines themselves) had no
test coverage. A regression swapping operands in the `gap`
arithmetic, dropping the sessionId from the log, or breaking the
`[errorKind]` suffix would ship silently and only surface when an
operator went grepping during an incident.

**[Suggestion] SSE ring eviction stderr log untested**
(PRRT_kwDOPB-92c6Dqtlb, server.ts:1948)

Added 2 tests:
- `writes a daemon-side stderr log on SSE ring eviction` — yields
  a `state_resync_required` frame from a fake bridge, spies on
  `process.stderr.write`, asserts the captured log contains
  `session sess-A` + `lastEventId=5` + `earliestInRing=12` +
  `gap=6 events` (pins the arithmetic) + `reason=ring_evicted` +
  `loadSession` (the recovery hint).
- `falls back to "?" placeholders when state_resync_required data is
  partial` — yields a frame with empty `data: {}`, asserts every
  `?? '?'` branch fires (lastEventId=? / earliestInRing=? /
  gap=? events / reason=?). Defensive against future daemon schema
  changes that drop one of these fields.

**[Suggestion] Bridge iterator error stderr log untested**
(PRRT_kwDOPB-92c6Dqtlh, server.ts:1993)

Added 2 tests:
- `writes a daemon-side stderr log on bridge iterator error` — fake
  bridge throws plain `Error('agent died')` mid-stream, captures
  stderr, asserts the log contains `session sess-A` + `agent died`,
  and **no** `[…]` suffix (plain Error → `mapDomainErrorToErrorKind`
  returns undefined → no suffix).
- `includes [errorKind] suffix in bridge iterator error log when
  classified` — fake bridge throws `BridgeTimeoutError('initialize',
  5000)`, asserts the log contains `[init_timeout]`. Pins the
  classified-vs-unclassified branch of the conditional suffix
  template.

All 4 tests use `vi.spyOn(process.stderr, 'write').mockReturnValue(
true)` + filter `mock.calls` for the relevant log prefix — same
pattern as the existing `mcp-client-manager.test.ts` stderr-spy
tests in core, plus `startupProfiler.test.ts` in cli.

Verification: 7/7 targeted observability tests pass. Pre-existing
F3 (#4335 merge) test failures elsewhere are unrelated to this
PR's changes.
Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review

@doudouOUC doudouOUC requested a review from yiliang114 May 21, 2026 03:03
Copy link
Copy Markdown
Collaborator

@yiliang114 yiliang114 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean protocol-level additions with solid backward compat.

Commit 1 (#19 stamping): _meta.serverTimestamp at the SSE write boundary is the right call — keeps BridgeEvent unpolluted. resolveToolProvenance as a static pure function with duck-typed mcp__ parsing and subagent precedence is well-designed; edge cases (malformed prefix, empty server segment) fall through to builtin cleanly.

Commit 2 (#15 gap detection): The earliestInRing > lastEventId + 1 check with a synthetic id-less state_resync_required frame before replay is straightforward and well-tested — positive cases, boundary off-by-one, empty ring, and fresh subscriber are all covered. SDK reducer gating via awaitingResync with RESYNC_PASSTHROUGH_TYPES for terminal lifecycle frames is a solid safety default.

preserveFsErrorOverAcp duck typing approach is consistent with the existing mapDomainErrorToErrorKind pattern for cross-package boundaries. Test matrix covers write + read paths, non-FsError passthrough, and wrong-name guard.

@doudouOUC doudouOUC merged commit a60c1c5 into daemon_mode_b_main May 21, 2026
6 checks passed
@doudouOUC doudouOUC deleted the feat/f4-prereq-protocol-completion branch May 21, 2026 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants