Skip to content

fix(core): add stream idle watchdog for silent responses#4256

Open
Alexxigang wants to merge 1 commit into
QwenLM:mainfrom
Alexxigang:fix/stream-idle-watchdog
Open

fix(core): add stream idle watchdog for silent responses#4256
Alexxigang wants to merge 1 commit into
QwenLM:mainfrom
Alexxigang:fix/stream-idle-watchdog

Conversation

@Alexxigang
Copy link
Copy Markdown
Contributor

Summary

  • What changed:
    • adds an idle watchdog around streamed next() calls in GeminiChat
    • links the caller abort signal into an internal controller so stalled provider streams can be cancelled safely
    • surfaces silent-stream timeouts as InvalidStreamError('STREAM_IDLE_TIMEOUT') and reuses existing transient retry handling
    • adds regression coverage for timeout, timer reset on progress, and env-based disablement
  • Why it changed:
    • some provider SSE streams can go silent without closing, which leaves the CLI hanging indefinitely
  • Reviewer focus:
    • timeout / abort lifecycle cleanup
    • retry behavior after STREAM_IDLE_TIMEOUT
    • env gating (QWEN_CODE_STREAM_IDLE_TIMEOUT_MS, QWEN_CODE_DISABLE_STREAM_WATCHDOG)

Validation

  • Commands run:
    cd packages/core
    npx vitest run src/core/geminiChat.test.ts -t "stream idle watchdog"
    npm run test -- src/core/geminiChat.test.ts
    npm run lint -- src/core/geminiChat.ts src/core/geminiChat.test.ts
    npm run typecheck
    npm run build
  • Prompts / inputs used:
    • mocked silent async generator with no further chunks
    • mocked delayed chunk stream with 30ms gaps under a 50ms timeout
  • Expected result:
    • silent streams retry and then fail instead of hanging forever
    • streams that continue making progress within the timeout complete normally
    • the watchdog can be disabled via env for debugging
  • Observed result:
    • all focused tests pass, plus full geminiChat.test.ts, lint, typecheck, and build
  • Quickest reviewer verification path:
    • set QWEN_CODE_STREAM_IDLE_TIMEOUT_MS=50
    • run the new stream idle watchdog tests in src/core/geminiChat.test.ts
    • inspect makeApiCallAndProcessStream() and processStreamResponse() for cleanup and retry flow
  • Evidence (output, logs, screenshots, video, JSON, before/after, etc.):
    • 91 tests passed in src/core/geminiChat.test.ts
    • npm run lint, npm run typecheck, and npm run build all completed successfully

Scope / Risk

  • Main risk or tradeoff:
    • very slow but still healthy streams may now timeout if the configured threshold is set too low
  • Not covered / not validated:
    • live end-to-end reproduction against a real flaky network/provider stream
  • Breaking changes / migration notes:
    • none
    • new optional env vars: QWEN_CODE_STREAM_IDLE_TIMEOUT_MS, QWEN_CODE_DISABLE_STREAM_WATCHDOG

Testing Matrix

🐗 🐰 🐂
npm run N/A
npx N/A N/A
Docker N/A N/A N/A
Podman N/A N/A N/A
Seatbelt N/A N/A N/A

Testing matrix notes:

  • Verified in the local packages/core workspace only.

Linked Issues / Bugs

Fixes #4177

*
* @example
* ```ts
* ``ts
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] JSDoc @example code fence markers are damaged — double backticks (``ts / ``) instead of triple backticks (```ts / ```). This breaks JSDoc rendering for sendMessageStream.

Suggested change
* ``ts
* ```ts
* const chat = ai.chats.create({model: 'gemini-2.0-flash'});
* const response = await chat.sendMessageStream({
* message: 'Why is the sky blue?'
* });
* for await (const chunk of response) {
* console.log(chunk.text);
* }
* ```

— DeepSeek/deepseek-v4-pro via Qwen Code /review

}, timeoutMs);
});

return Promise.race([nextPromise, timeoutPromise]).finally(clearTimers);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Each stream chunk in the hot while(true) loop allocates a new Promise.race + 2 setTimeout/clearTimeout pairs + 3 Promise objects via streamWatchdog.next(). For a stream with 20-50 chunks this creates measurable overhead. Consider lifting the timeout timer outside the loop and using timeoutId.refresh() per chunk (Node.js timer refresh API) to eliminate per-chunk timer churn while preserving the idle-detection guarantee.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

},
authType: this.config.getContentGeneratorConfig()?.authType,
persistentMode: isUnattendedMode(),
signal: params.config?.abortSignal,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] retryWithBackoff receives only params.config?.abortSignal (the user's signal), not the internal streamAbortController.signal. When the stream idle watchdog fires during a retry delay, retryWithBackoff won't notice the internal abort and will wait the full delay before calling apiCall — which will then immediately fail because streamAbortController.signal is already aborted. Consider passing streamAbortController.signal (or a combined signal) to retryWithBackoff so internal aborts skip unnecessary retry delays.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add SSE stream idle watchdog to abort hung streams (weak-network hang)

2 participants