feat(telemetry): Phase 2 — tool.blocked_on_user + hook spans (#3731)#4321
feat(telemetry): Phase 2 — tool.blocked_on_user + hook spans (#3731)#4321doudouOUC wants to merge 17 commits into
Conversation
Adds two OTel span types under the existing hierarchical session-tracing infrastructure (#3731 Phase 2; depends on Phase 1 #4126 and Phase 1.5 #4302): 1. `qwen-code.tool.blocked_on_user` — brackets the time a tool spends in awaiting_approval waiting for the user. Child of the tool span. Records decision (proceed_once / proceed_always / cancel / aborted / auto_approved) and source (cli / ide / hook / auto / system). Status stays UNSET — waiting is neither OK nor ERROR. 2. `qwen-code.hook` — wraps each pre/post-hook fire site so a slow hook can be told from a slow tool. Records hook_event (PreToolUse / PostToolUse / PostToolUseFailure), tool_name, shouldProceed, shouldStop, blockType, hasAdditionalContext. Status stays UNSET on intentional blocking decisions; ERROR only when the hook itself throws. To make blocked_on_user a child of the tool span, the tool span lifecycle moved from `executeSingleToolCall` to `_schedule`'s validating-loop — covering validating → awaiting_approval → executing in one span. Two new private Maps on CoreToolScheduler hold span refs across method boundaries (callId-keyed). Centralized cleanup via `finalizeToolSpan` / `finalizeBlockedSpan` private helpers ensures every terminal status path also ends the corresponding span. Eight terminal sites now finalize the tool span: signal.aborted at loop entry, hard deny, plan-mode block, non-interactive deny, permission-hook deny, background-agent deny, _schedule catch, executeSingleToolCall finally. Five blocked_on_user end sites: handleConfirmationResponse cancel and proceed branches, autoApproveCompatiblePendingTools, _schedule catch under signal.aborted, and the global-error catch. ModifyWithEditor stays inside one blocked_on_user span until the final proceed/cancel — the duration_ms reflects total user think-time including editor side trips. Six hook fire sites are wrapped: firePreToolUseHook, firePostToolUseHook, and four safelyFirePostToolUseFailureHook variants (success-path interrupt, toolResult.error path, catch-path interrupt, catch-path real exception). fireNotificationHook is intentionally NOT wrapped — it's fire-and-forget and the duration is meaningless. Mirrors claude-code's session-tracing pattern but deliberately diverges on one point: every end-helper takes the span object explicitly via `getSpanId(span)` lookup instead of `findLast`-by-type. Under concurrent tool calls, claude-code's findLast can end the wrong blocked span; passing the ref directly is concurrency-safe. Tests: - session-tracing.test.ts: 11 new tests covering parent resolution (explicit parent for blocked_on_user, ALS-based for hook), idempotent end, NOOP behavior, error-status mapping, and a concurrency regression test (two parallel blocked spans ended in reverse order). - coreToolScheduler.test.ts: mock extended with the four new helpers and two new metadata fields. New tests cover the tool span outliving a pre-hook deny path, blocked_on_user ending with cancel via the awaiting_approval flow, hook span recording shouldProceed=false / blockType='denied' on pre-hook block and shouldStop=true / blockType='stop' on post-hook stop, and a leak guard that asserts every recorded lifecycle span is ended after a successful tool call. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
📋 Review SummaryThis PR implements Phase 2 of the hierarchical session-tracing plan (#3731), adding two new OTel span types ( 🔍 General Feedback
🎯 Specific Feedback🟢 Medium
🔵 Low
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
Adds Phase 2 hierarchical session-tracing spans to improve observability of (1) time spent awaiting user approval for tool calls and (2) hook execution latency/behavior around tool calls.
Changes:
- Introduces
qwen-code.tool.blocked_on_userspan helpers and wires them into the tool confirmation state machine. - Introduces
qwen-code.hookspan helpers and wraps pre/post hook fire sites inCoreToolScheduler. - Updates telemetry exports/constants and expands unit tests for new span types and scheduler lifecycle behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/telemetry/session-tracing.ts | Adds span helpers for tool.blocked_on_user and hook, including end semantics and attributes. |
| packages/core/src/telemetry/session-tracing.test.ts | Adds unit tests covering parenting, attributes, idempotency, and concurrency behavior for new spans. |
| packages/core/src/telemetry/index.ts | Re-exports new helpers and related types. |
| packages/core/src/telemetry/constants.ts | Adds span name constants for the two new span types. |
| packages/core/src/core/coreToolScheduler.ts | Moves tool span lifecycle to cover validate→approval→execute; creates/ends blocked-on-user + hook spans at relevant sites. |
| packages/core/src/core/coreToolScheduler.test.ts | Extends scheduler telemetry tests to assert new spans are created/ended and metadata is propagated. |
Comments suppressed due to low confidence (2)
packages/core/src/core/coreToolScheduler.ts:2026
- Same issue as earlier: the fallback tool-span creation passes
{ tool_name, call_id }intostartToolSpan, which will record nonstandard attributes and omit'tool.call_id'. Use namespaced attribute keys (e.g.{ 'tool.call_id': callId }) and avoid duplicating tool name (already set by the first argument).
let toolSpan = this.toolSpans.get(callId);
if (!toolSpan) {
toolSpan = startToolSpan(toolName, {
tool_name: toolName,
call_id: callId,
});
this.toolSpans.set(callId, toolSpan);
packages/core/src/core/coreToolScheduler.ts:1829
- Proceed path always finalizes the blocked span with
source: 'cli', buthandleConfirmationResponse()can be invoked by IDE diff resolution as well (seeopenIdeDiffIfEnabled). Suggest: choosesource: 'ide'when in IDE mode / when the confirmation came from the IDE integration, otherwisecli.
// Proceed: end the blocked span before execution begins. ProceedOnce
// and the three ProceedAlways* variants all close the awaiting phase.
// The tool span itself stays open and is finalized in
// executeSingleToolCall.
const decision: ToolBlockedDecision =
outcome === ToolConfirmationOutcome.ProceedOnce
? 'proceed_once'
: 'proceed_always';
this.finalizeBlockedSpan(callId, decision, 'cli');
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
… + silent-failure-hunter Eight discrete fixes plus two new tests, all surfaced in the Phase 2 review rounds. Grouped here because they touch the same handful of code paths. Copilot inline (#4321 PR): 1. startToolSpan attrs naming: drop redundant `tool_name` (helper already sets `'tool.name'` from the first arg) and rename `call_id` to the namespaced `'tool.call_id'`. Two sites: `_schedule` validating-loop start, and the defensive fallback in executeSingleToolCall. Without this, traces emit non-namespaced `tool_name` / `call_id` attributes that consumers grepping for `tool.call_id` miss. 2. PreToolUse hook span: propagate the actual `preHookResult.blockType` ('denied' / 'ask' / 'stop') instead of collapsing every block to 'denied'. Also record `hasAdditionalContext` for parity with the PostToolUse / failure-hook spans. 3. blocked_on_user `source` detection: use `config.getIdeMode()` (best- effort) so IDE-driven decisions don't all show up as `'cli'`. Centralized in a new `getBlockedSource()` helper. silent-failure-hunter / code-reviewer: 4. Hook span error-tracking is dead code. firePreToolUseHook / firePostToolUseHook / safelyFirePostToolUseFailureHook all swallow throws internally — every `catch (e) { endMeta = { error, ... }; throw e }` block in the scheduler was unreachable. Simplify all 6 sites to `try { ... } finally { endHookSpan(...) }`. The default `endMeta = { success: false }` keeps the span sensible if a future hook impl decides to throw. 5. handleConfirmationResponse had no error handling. modifyWithEditor / _applyInlineModify / attemptExecutionOfScheduledCalls can throw and would otherwise leak both the tool span and the blocked_on_user span until the 30-min TTL fires. Wrap the body in a try/catch that finalizes both spans on rethrow. Extracted the body to `_handleConfirmationResponseInner` for clarity. 6. Add `'error'` to the `ToolBlockedDecision` union for system-error closes, so dashboards counting `decision: 'cancel'` don't get polluted by thrown exceptions. 7. _schedule's outer catch was labelling its non-aborted close as `'cancel'`. Switch to `'error'` (uses #6). 8. signal.aborted vs explicit user Cancel: when both are true, the old code reported `'aborted'/'system'` even though the user actually clicked Cancel. Reverse the precedence so `outcome === Cancel` wins, with `getBlockedSource()` for the source. Tests: - T1: extend the existing ProceedAlways auto-approve test to assert the two siblings' blocked spans end with `decision: 'auto_approved'`, `source: 'auto'`, while the first tool ends as `'proceed_always'`/cli. - T2: existing cancel-during-confirmation test now also asserts exactly one blocked span is recorded for the lifecycle — the same invariant ModifyWithEditor's intentional preservation across editor side trips must not break. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
… behaviors Two follow-ups from the post-#6767469b2 review pass on PR #4321: 1. autoApproveCompatiblePendingTools error path was logging-only and leaving the sibling tool's blocked_on_user span open until the 30-min TTL fires. Symmetric with the success branch's finalizeBlockedSpan('auto_approved', 'auto'), the catch now finalizes with ('error', 'system') so the trace deterministically explains why the sibling didn't auto-approve. 2. Three behaviors introduced by 6767469 had no test coverage: - decision='error' from _schedule's outer catch when getConfirmationDetails throws (asserts tool span ends, no blocked span ever opens since the throw happens pre-awaiting_approval). - source='ide' when getBlockedSource() honors getIdeMode (Cancel path with getIdeMode: () => true). - Explicit Cancel takes precedence over a concurrent signal.aborted in the decision label — the bug the precedence flip was meant to fix is now regression-tested. Extracted a small `buildApprovalScheduler` helper for the two awaiting_approval-flow tests; the throw-on-confirmation test reuses StructuredErrorOnConfirmationTool. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
…odex P3) The previous commit 32f94d3 added a `finalizeBlockedSpan(callId, 'error', 'system')` to the autoApproveCompatiblePendingTools catch in the name of "symmetry with the success branch". Codex review pointed out the bug: that catch fires when evaluatePermissionFlow throws for a SIBLING tool, but the sibling itself is still in `awaiting_approval` — the user can still respond. By closing the blocked span at the catch, the eventual handleConfirmationResponse → finalizeBlockedSpan call becomes a no-op (Map.delete already cleared it), and the user's actual decision / source attributes are lost from the trace. Revert that line. The previous behavior was correct: log the error, leave the span open, let the user's eventual decision close it correctly. If the user never responds, the 30-min TTL in session-tracing.ts cleans up the orphan span — same fallback that already covered every other "user walks away" scenario. The "leak" the original change was trying to fix was a phantom: the span IS finalized once the user (or the abort signal) drives the tool to a terminal state. The TTL is just the safety net. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Codex review disposition (
|
wenshao
left a comment
There was a problem hiding this comment.
[Critical] coreToolScheduler.ts:1835 — ModifyWithEditor silent failure leaks blocked + tool spans
When getPreferredEditor() returns undefined, the bare return in _handleConfirmationResponseInner leaks both the blocked span (created in _schedule) and the tool span. No log, no user feedback — the tool silently stays in awaiting_approval until the 30-min TTL fires.
if (!editorType) {
debugLogger.warn(
`ModifyWithEditor requested for ${callId} but no editor available`,
);
this.finalizeBlockedSpan(callId, 'error', 'system');
return;
}
…ecision Two #4321 review comments from wenshao, both Critical: 1. `TOOL_FAILURE_KIND_PRE_HOOK_BLOCKED` was being emitted for FIVE distinct non-PreToolUse-hook deny paths in `_schedule`: - finalPermission === 'deny' (hard deny) - plan-mode block - non-interactive deny - permission_request hook deny - background-agent deny Dashboards filtering by `failure_kind = 'pre_hook_blocked'` were silently picking up all of these, undermining the attribute. Add distinct constants + status messages for each path. The original PRE_HOOK_BLOCKED label is now used at exactly one site — the actual PreToolUse hook deny in `_executeToolCallBody`. 2. `decision: 'proceed_once'` was untested. Existing tests covered 'cancel' and 'proceed_always' (auto-approve) but not the most common user interaction. Add a test that schedules an approval-required tool, confirms with ProceedOnce, and asserts the blocked span ends with `decision: 'proceed_once'`, `source: 'cli'`. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Three review items folded into one follow-up: 1. wenshao Critical (`coreToolScheduler.ts:1851`) — `ModifyWithEditor` path silently returned when `getPreferredEditor()` was undefined, leaking blocked + tool spans on user-walks-away. Add a `debugLogger.warn` so the silent failure is at least visible in debug telemetry. Deliberately do NOT finalize spans here, matching the Codex P3 / autoApprove decision: ModifyWithEditor stays inside one awaiting period, the user can still recover via Cancel/Proceed which closes the spans correctly, and the 30-min TTL is the safety net for give-up scenarios. Finalizing prematurely would make the user's eventual decision a no-op (Map already cleared) and lose the actual decision/source attributes. 2. Bot summary Medium (`session-tracing.ts:557-562`) — add a `debugLogger.debug` when `startToolBlockedOnUserSpan` falls back to `resolveParentContext` because the tool span isn't in `activeSpans` anymore. Helps diagnose unexpected ordering during development. 3. Bot summary Low (`constants.ts`) — JSDoc the two new span name constants. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
| # | Item | Disposition |
|---|---|---|
| Medium 1 | session-tracing.ts:557-562 debug log on fallback |
✅ Adopted |
| Medium 2 | finalizeToolSpan / finalizeBlockedSpan not in telemetry exports |
❌ Push back — they're explicitly private scheduler-state-machine helpers, not telemetry public API |
| Low 1 | debugLogger.error instead of warn on span end failure |
❌ Push back — span end failures are extremely rare, recoverable, and don't break the service. warn is the existing convention |
| Low 2 | Extract safelyEndSpan helper |
❌ Push back — DRY across 3 sites with ~5 lines each isn't worth the indirection. Existing code structure is consistent |
| Low 3 | JSDoc on new SPAN constants | ✅ Adopted |
| Low 4 | Be more explicit about ProceedAlways* outcomes |
❌ Push back — comment is accurate; listing ProceedAlways / ProceedAlwaysProject / ProceedAlwaysUser would be noise |
Verification
- 191 / 191 tests pass
tsc --noEmitclean- pre-commit clean
🤖 Generated with Qwen Code
…ToolSpan param Two #4321 review Suggestions from wenshao: 1. The 6 hook fire sites (PreToolUse, PostToolUse, 4× PostToolUseFailure) each repeated the same try/finally + endMeta init + endHookSpan pattern. Future hook span protocol changes had to be made in lockstep. Extract a private generic helper: withHookSpan<T>(opts, fn, toEndMeta): Promise<T> Each fire site collapses from ~12 lines of try/finally scaffolding to ~3 lines passing in the fire callback + endMeta builder. The `let postHookResult!:` definite-assignment hack at the PostToolUse site is gone because the helper returns the awaited result directly. 2. `finalizeToolSpan(callId, metadata?)` had a dead `metadata` parameter — every caller pre-sets the span status via `setToolSpan{Failure,Cancelled}` and called `finalizeToolSpan` with no argument. Removed the parameter. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…id back-compat Three #4321 review threads from wenshao (#4321 codex P3-equivalent + two structural concerns): 1. **[Critical] Hook spans reported success on swallowed hook failures.** firePreToolUseHook / firePostToolUseHook / firePostToolUseFailureHook (and the safelyFire wrapper in coreToolScheduler) all catch transport / dispatch errors internally and return safe defaults. Before this fix, withHookSpan's `toEndMeta` ran on the safe default and recorded `success: true` — a crashing hook was indistinguishable from one that allowed execution. Add a `hookError?: string` field to the three result types, populate it in each catch, and have all 6 toEndMeta callbacks return `{ success: false, error: hookError }` when present. Existing "graceful error" tests updated to expect the new field. 2. **[Suggestion] ensureCleanupInterval not kicked from new helpers.** The 30-min TTL cleanup safety net for leaked spans only starts when `startInteractionSpan` is first called. Sub-agent or side-query code paths that call `startToolBlockedOnUserSpan` / `startHookSpan` without an interaction span first never trigger cleanup. Both helpers now call the (idempotent) `ensureCleanupInterval()` early. 3. **[Suggestion] `call_id` → `'tool.call_id'` rename is breaking for downstream consumers.** Phase 1's `startToolSpan(name, { tool_name, call_id })` shipped non-namespaced attribute keys. My Phase 2 #4321 review-fix dropped both. Dual-emit `call_id` (legacy alias) + `'tool.call_id'` for one release cycle so existing dashboards / alerts don't silently return zero. Comment notes the legacy key is removed in the next release. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Inline review findings: - coreToolScheduler.ts: signal.abort drains scheduler-local toolSpans/blockedSpans Maps via deferred setTimeout(0) — bridges the gap between session-tracing's 30-min TTL (which ends underlying spans but cannot reach the Maps) and walk-away-during-awaiting_approval. The drain is deferred so explicit Cancel via handleConfirmationResponse and mid-execution setToolSpanCancelled paths still win the race and set canonical labels. - coreToolScheduler.test.ts: regression test for permission_hook_denied (firePermissionRequestHook deny branch at _schedule:1683) and background_agent_denied (getShouldAvoidPermissionPrompts auto-deny at _schedule:1697). Both branches were untested — silently dropping setToolSpanFailure on either would lose attribution. - coreToolScheduler.ts: defensive-fallback span in executeSingleToolCall uses canonicalToolName(toolName) so dashboards grouping by span name don't see two entries for migrated/MCP tools whose canonical and raw names differ. Review-body finding: - session-tracing.ts: TTL safety net stamps qwen-code.span.ttl_expired + qwen-code.span.duration_ms attributes and emits a debug log before ending stale spans. Operators can now distinguish "abandoned and garbage-collected by the safety net" from "deliberately ended without status/attrs". Refactored cleanup loop into sweepStaleSpans(now) and exposed runTTLSweepForTesting for unit coverage. Tests: +3 scheduler tests (~220 LOC), +2 session-tracing tests (~36 LOC). 247/247 in affected files. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
Adopted ([Critical]): - coreToolScheduler.ts: ModifyWithEditor `!editorType` path now sets `qwen-code.tool.modify_with_editor_unavailable: true` on the live tool span so operators can detect the silent-bail-out state in production traces without enabling debug logging. - coreToolScheduler.test.ts: regression test for plan_mode_blocked failure_kind path (ApprovalMode.PLAN + non-read-only confirmation tool). - coreToolScheduler.test.ts: regression test for the pre-aborted signal early-exit in `_schedule` — asserts setToolSpanCancelled (UNSET status) without entering execution. Adopted ([Suggestion]): - coreToolScheduler.ts: `withHookSpan` now `catch`-es and surfaces the actual thrown message instead of the hardcoded `'hook fn threw before toEndMeta'` sentinel. Currently unreachable (hook helpers swallow internally) but defensive against contract drift. - coreToolScheduler.ts: re-add `tool_name` (non-namespaced) as a legacy alias on both startToolSpan call sites, mirroring the `call_id` / `tool.call_id` dual-emit window so pre-Phase-2 dashboards filtering on `tool_name` don't silently stop matching during the rollout. - coreToolScheduler.test.ts: regression test for the `_schedule`-driven aborted decision label on the blocked_on_user span (companion to the existing tool-span drain test). - coreToolScheduler.ts: PreToolUse / PostToolUse `toEndMeta` now include `shouldProceed: true` / `shouldStop: false` when `hookError` is set, mirroring the runtime's allow-on-hook-failure semantics. Pushed back (separate PR-level reply): - "sibling failure prematurely closes confirmed tool span" — not reachable: `_executeToolCallBody` swallows execution errors so the only paths into `handleConfirmationResponse`'s catch are `originalOnConfirm` / `modifyWithEditor` / `_applyInlineModify`, none of which run after `attemptExecutionOfScheduledCalls` started any sibling. - "PostToolUseFailure hook spans not asserted" — broader scope, defer. - "finalizeToolSpan accept required metadata" — invariant-redesign, out of scope for this PR. Tests: +3 scheduler tests; 250/250 green in affected files (coreToolScheduler 154 + session-tracing 49 + toolHookTriggers 47). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
- coreToolScheduler.ts: handleConfirmationResponse outer catch now
branches on signal.aborted — a throw caused by the abort signal
(e.g. ModifyWithEditor child interrupted by Ctrl+C) lands as
decision:'aborted'/UNSET status instead of 'error'/tool_exception,
matching the sister catch in `_schedule` and keeping dashboard
abort-vs-error counts honest (Critical-shaped Suggestion).
- coreToolScheduler.ts: drop the per-batch abort listener at the end
of `_schedule` when no batch entries remain in toolSpans /
blockedSpans. Prevents Node's MaxListenersExceededWarning in
long-lived sessions where the same AbortSignal sees many _schedule
batches without a real abort. Listeners that still cover
awaiting_approval entries stay attached — the user's eventual
decision closes the spans, and the listener becomes a no-op when it
later fires (or auto-removes via `{ once: true }` on real abort).
- coreToolScheduler.test.ts: 2 regression tests for PostToolUseFailure
hook span variants — `is_interrupt:true` on user-abort vs
`is_interrupt:false` on real-exception. Operators rely on this flag
to separate user-initiated cancellations from system errors in
dashboards; a copy-paste regression flipping the value across the 4
PostToolUseFailure call sites was previously invisible.
Tests: 252/252 across affected files (coreToolScheduler 156 +
session-tracing 49 + toolHookTriggers 47).
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
wenshao
left a comment
There was a problem hiding this comment.
Additional suggestions (not on diff lines):
-
session-tracing.ts:69— The comment "Phase 2 forward-declarations (no start*/end* helpers wired yet)" is now stale since this PR wiresstartToolBlockedOnUserSpan,endToolBlockedOnUserSpan,startHookSpan, andendHookSpan. Consider updating to reflect that helpers are now wired. -
coreToolScheduler.ts:1654—firePermissionRequestHookis the only hook fire site on the approval critical path that is NOT wrapped withwithHookSpan. All 6 PreToolUse/PostToolUse/PostToolUseFailure sites are wrapped. Operators have no hook-span telemetry for the permission approval flow — a slow policy server would show up as tool-span latency with no hook-level breakdown.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
Adopted ([Critical]):
- coreToolScheduler.ts: full per-batch abort listener cleanup. Replaced
the closure-local Set + end-of-_schedule cleanup with a class-level
callIdToBatch Map keyed off a shared BatchAbortState. The listener is
now released by `finalizeToolSpan` → `releaseBatchListenerIfDrained`
whenever the last live batch entry drains, regardless of whether
finalize happens synchronously inside _schedule, later via
handleConfirmationResponse, or via executeSingleToolCall. Closes
the awaiting_approval-batches-leak-listeners gap from the previous
partial fix.
- coreToolScheduler.ts: re-check signal.aborted in the _schedule
for-loop after `evaluatePermissionFlow`/`getConfirmationDetails`/
`firePermissionRequestHook` and BEFORE setting awaiting_approval +
starting the blocked span. Without this, a signal that aborts during
one of those awaits opens a blocked span on an already-aborted
signal whose drainSpansForBatch may have already fired, leaving the
new entry permanently orphaned.
- session-tracing.ts: introduce truncateSpanError(s) (1KB cap) and
apply it to every endXSpan site that writes metadata.error to span
attributes / status messages (LLM, tool, tool execution, hook).
Hook server responses, raw exception stacks, or hostile inputs can
be unbounded; some OTel backends drop the entire span when any
field exceeds their limit.
Adopted ([Suggestion]):
- coreToolScheduler.ts: per-callId try/catch inside drainSpansForBatch.
One bad finalize no longer skips the rest of the batch; failures
are logged via debugLogger.warn instead of bubbling up as an
unhandled timer-callback exception.
- session-tracing.ts: TTL sweep robustness — wraps setAttributes and
span.end() in separate try/catch blocks so a setAttributes throw
can't leak the OTel span; stamps `decision: 'aborted'`/
`source: 'system'` on TTL-expired blocked_on_user spans so
dashboards filtering by decision count walk-aways consistently with
explicit user aborts; includes tool.name + tool.call_id in the
warn log so it's actionable in production without a trace-backend
lookup.
- coreToolScheduler.ts: extract the 4 byte-identical PostToolUseFailure
toEndMeta lambdas into a single `postToolUseFailureEndMeta` member.
Future protocol changes only need to touch one place.
- coreToolScheduler.test.ts: 3 new tests
* outer-catch aborted branch — pre-aborted signal + throwing
onConfirm asserts decision='aborted'/source='system' and
failure_kind='cancelled'.
* ModifyWithEditor !editorType — uses a getModifyContext-shimmed
MockEditTool to enter the modifiable branch and asserts
qwen-code.tool.modify_with_editor_unavailable=true.
* per-batch listener removed when batch drains synchronously —
asserts AbortSignal listenerCount and `callIdToBatch` size.
Pushed back (deferred):
- "firePermissionRequestHook in withHookSpan + hookError field" —
same as previous deferral. Touches the public PermissionRequestHookResult
type re-exported from packages/core/src/index.ts; declined per the
guardrail on public-API changes.
Tests: 255/255 across affected files (coreToolScheduler 159 +
session-tracing 49 + toolHookTriggers 47).
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
- session-tracing.ts: rename `SPAN_ERROR_MAX_BYTES` → `SPAN_ERROR_MAX_CHARS` and update the JSDoc to be honest that `truncateSpanError` truncates by UTF-16 code units rather than bytes. CJK/emoji-heavy errors land in the ~2-3KB UTF-8 range under the same code-unit cap, but that's still well under all major OTel backends' per-attribute limits (Jaeger/Honeycomb ~64KB, OTLP default ~32KB), so we keep the simpler char-count bound rather than paying the encoder cost on every endXSpan. - coreToolScheduler.ts: move the `withHookSpan` JSDoc block to sit directly above the method. The previous order had two consecutive JSDoc blocks separated by `postToolUseFailureEndMeta`, which orphaned the `withHookSpan` doc — IDE hover tooltips would surface the wrong documentation. Tests: 208/208 in affected files; tsc --noEmit clean. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
Additional observations (not on diff lines):
-
firePermissionRequestHooknot wrapped withwithHookSpan(coreToolScheduler.ts:1734): The PR wraps 6 hook fire sites with span instrumentation, butfirePermissionRequestHook— a decision-making hook that can return allow/deny — is not wrapped. TheHookEventtype also lacks a'PermissionRequest'variant. This makes the permission hook's latency and decision invisible in traces. If this exclusion is intentional, a comment at the call site noting the rationale would help future maintainers. -
endToolBlockedOnUserSpannever sets ERROR status (session-tracing.ts:706): When a blocked span is closed withdecision: 'error'(e.g., system exception during approval flow), the span status stays UNSET. Operators filtering bySpanStatusCode.ERRORin trace backends won't surface these. Consider setting ERROR status whenmetadata?.decision === 'error'. -
Test coverage gaps (pattern): Three specific new code paths lack test coverage: (1)
truncateSpanError— no test passes an oversized string to verify truncation; (2) TTL sweeptool.blocked_on_userbranch — the type-specificdecision: 'aborted'stamping is untested; (3)endHookSpanidempotency — the blocked_on_user span has an idempotency test but the hook span does not.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
Adopted ([Suggestion]):
- coreToolScheduler.ts: `setToolSpanFailure` now applies
`truncateSpanError` to the status message at this single ingress
point. Many of its 10+ call sites pass raw `error.message` which can
be unbounded — the same backend-drop risk that drove
`truncateSpanError` for the endXSpan attribute writes. Static-
constant callers see no change since their messages are well under
the 1024-char cap. Required exporting `truncateSpanError` from
`session-tracing.ts` and re-exporting from `telemetry/index.ts`.
- coreToolScheduler.ts: in `_schedule`, after the for-loop runs to
completion, drop the abort listener if `batchState.callIds.size === 0`.
Closes the all-error-batch leak path: if every newToolCall had
`status !== 'validating'` (e.g., invalid params, tool not registered,
queue full), no `finalizeToolSpan` ever fires for the batch and
`releaseBatchListenerIfDrained` is never invoked. Without this drop,
one dead listener accumulates per all-error batch.
- coreToolScheduler.ts: `handleConfirmationResponse` outer catch now
emits a `debugLogger.warn` before rethrowing. Without it, if the
caller (CLI confirmation UI layer) doesn't log the rejection, the
error disappears from application logs entirely — operators
grepping by callId would see nothing despite the trace backend
showing `failure_kind: tool_exception`.
- session-tracing.test.ts: 4 new tests
* `truncateSpanError` returns short strings unchanged
* `truncateSpanError` truncates over 1024 chars + appends sentinel
* `truncateSpanError` boundary at exactly 1024 chars
* TTL sweep stamps `decision: 'aborted'` + `source: 'system'` on
blocked_on_user spans (covers the branch added in review-3 round)
Pushed back ([Suggestion]):
- "TTL sweep can't reach scheduler-local Maps" — accurate but the fix
is non-trivial: a parallel scheduler-side TTL sweep duplicates the
session-tracing sweep's bookkeeping, and the practical impact is
bounded (Maps die with the scheduler instance, which is per-session
in CLI mode). The bigger leak (listener accumulation on shared
signals) is already covered by `releaseBatchListenerIfDrained`.
Marking as out-of-scope architectural follow-up.
Tests: 259/259 across affected files (coreToolScheduler 159 +
session-tracing 53 + toolHookTriggers 47). `tsc --noEmit` clean.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- coreToolScheduler.test.ts: convert the `truncateSpanError` mock from an inline identity function to `vi.fn(identity)` so individual tests can substitute a sentinel return. Added regression test `setToolSpanFailure forwards the truncateSpanError result to the span status (#4321)` that overrides the spy with `<<TRUNCATED-SENTINEL>>`, drives the scheduler through the pre-hook deny path, and asserts the span's ERROR status message equals the sentinel — locks the integration so a regression dropping the `truncateSpanError(message)` call inside `setToolSpanFailure` is caught at the scheduler boundary rather than only at the utility's unit test. Tests: 213/213 across `coreToolScheduler.test.ts` (160) + `session-tracing.test.ts` (53). `tsc --noEmit` clean. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Summary
Phase 2 of the hierarchical session-tracing plan in #3731. Adds two OTel span types on top of the infrastructure landed by Phase 1 (#4126) and Phase 1.5 (#4302).
qwen-code.tool.blocked_on_user— brackets the time a tool spends inawaiting_approvalwaiting for the user. Child of the tool span. Recordsdecision(proceed_once/proceed_always/cancel/aborted/auto_approved) andsource(cli/ide/hook/auto/system). Status stays UNSET — waiting is neither OK nor ERROR.qwen-code.hook— wraps each pre/post-hook fire site so a slow hook can be told from a slow tool. Recordshook_event(PreToolUse/PostToolUse/PostToolUseFailure),tool_name,shouldProceed,shouldStop,blockType,hasAdditionalContext. Status stays UNSET on intentional blocking; ERROR only when the hook throws.Architectural change
To make
blocked_on_usera child of the tool span, the tool span lifecycle moved fromexecuteSingleToolCallto_schedule's validating-loop — coveringvalidating → awaiting_approval → executingin one span. Two new private Maps onCoreToolSchedulerhold span refs across method boundaries (callId-keyed). Centralized cleanup viafinalizeToolSpan/finalizeBlockedSpanensures every terminal status path also ends the corresponding span._schedulecatch,executeSingleToolCallfinally.blocked_on_userend sites:handleConfirmationResponsecancel + proceed,autoApproveCompatiblePendingTools,_schedulecatch under signal.aborted, global-error catch. ModifyWithEditor stays inside one span until final proceed/cancel —duration_msreflects total user think-time including editor side trips.firePreToolUseHook,firePostToolUseHook, foursafelyFirePostToolUseFailureHookvariants.fireNotificationHook(permission_prompt) intentionally NOT wrapped — fire-and-forget, duration is meaningless.Diverges from claude-code on one point
claude-code's
endToolBlockedOnUserSpanusesfindLast-by-type to locate the span — under concurrent tool calls this can end the wrong span. Our helpers take the span object explicitly viagetSpanId(span)lookup. Theconcurrent blocked spanstest asserts the regression that bug would otherwise produce.Closes Phase 2 in #3731
Ticks off these checklist items in the parent issue under
### Deeper observability (P3) → Phase 2:tool.blocked_on_userspan type + helper; wire into approval state machine incoreToolScheduler._schedulehookspan type + helper; wire into pre/post hook execution incoreToolScheduler.executeSingleToolCallTest plan
pnpm typecheck— cleannpx vitest run src/telemetry/session-tracing.test.ts src/core/coreToolScheduler.test.ts— 187 / 187 pass (47 telemetry helper tests + 140 scheduler tests)npx vitest run(full core package) — only the preexistinganthropicContentGeneratorUser-Agent test + a fewcrawler.test.ts/gitDiff.test.tsfilesystem-timing failures remain. Confirmed unrelated to telemetry — none touch the changed files.npx eslint— cleanRollback
This PR moves the tool span lifecycle. The full set of changes is contained in one commit (
38ba22d9c); revert is a single git revert. The new helpers are additive — disabling them would only require dropping the calls incoreToolScheduler.tsand removing the new exports fromsession-tracing.ts.🤖 Generated with Qwen Code
Local verification (tmux + vitest)
Run the script below in a tmux pane to verify the PR locally.
Captured output and expected results are inlined.
Script
Run via tmux
Expected
TYPECHECK_OKline printed.Test Files 2 passed (2)andTests 187 passed (187).✓.Captured output (this PR, branch
feat/telemetry-phase-2-spans@6767469b2)Coverage map
parents the blocked span under the explicitly-passed tool spanrecords decision/source attributes on end and leaves status UNSEThandles concurrent blocked spans without findLast confusiontool span is started in _schedule and ended even when pre-hook deniesblocked_on_user span ends with cancel when the user rejectshandleConfirmationResponseevery span recorded in a successful tool call is endedauto-approve remaining tool calls ... ProceedAlwaysdecision: 'auto_approved',source: 'auto'hook span records shouldProceed=false / blockType=denied when pre-hook blockshook span records shouldStop=true when post-hook stops executionmarks status ERROR only when the hook itself threw