Skip to content

fix: prevent compounding retries between SDK and framework retry layers#5335

Open
Shubhrakanti wants to merge 13 commits intomainfrom
shubhra/fix-retries
Open

fix: prevent compounding retries between SDK and framework retry layers#5335
Shubhrakanti wants to merge 13 commits intomainfrom
shubhra/fix-retries

Conversation

@Shubhrakanti
Copy link
Copy Markdown
Contributor

Summary

Fixes a bug where two independent retry layers compound multiplicatively, causing far more HTTP requests than expected with most retries invisible in logs.

The problem

The framework's LLMStream._main_task() in llm.py implements retry logic controlled by APIConnectOptions (default: max_retry=3, timeout=10s). Separately, the OpenAI SDK client has its own built-in retry mechanism (max_retries, SDK default: 2).

When both layers are active, each framework-level retry triggers the _run() method, which makes an API call through the OpenAI SDK. If that call times out, the SDK silently retries internally before surfacing the error back to the framework, which then retries again. This produces multiplicative behavior:

Layer Attempts Visible in logs?
Framework (_main_task) 1 initial + 3 retries = 4 Yes
OpenAI SDK (per framework attempt) 1 initial + 2 retries = 3 No
Total HTTP requests 4 × 3 = 12 Only 3 retry log lines

A user expecting 3 retries over ~30s instead sees 3 logged retries, each taking ~30s (3 × 10s timeout silently inside the SDK), for a total wait of ~120s with 12 actual HTTP requests.

The fix

  • inference/llm.py: Add max_retries=0 to the openai.AsyncClient constructor. This was the only client in the codebase that did not set this, causing it to silently use the OpenAI SDK's default of 2 retries. Every other client (openai/llm.py, openai/stt.py, openai/tts.py, openai/responses/llm.py) already had max_retries=0.

  • openai/llm.py: Always set max_retries=0 on the SDK client regardless of user input. Previously, passing max_retries=N would enable SDK-level retries that compound with the framework's retries. Now emits a DeprecationWarning when a non-zero value is passed, guiding users to configure retries exclusively through APIConnectOptions.max_retry.

How to configure retries (before and after this fix)

# Correct way to configure retries — unchanged by this PR
from livekit.agents.types import APIConnectOptions

agent = Agent(
    ...,
    llm=openai.LLM(model="gpt-4.1"),
    conn_options=APIConnectOptions(
        max_retry=3,        # number of retries (framework-level, with logging)
        retry_interval=2.0, # seconds between retries
        timeout=10.0,       # per-attempt timeout
    ),
)

Test plan

  • Verify inference.LLM no longer triggers SDK-level retries (only framework retries visible in logs)
  • Verify openai.LLM(max_retries=3) emits a DeprecationWarning and does not pass the value to the SDK client
  • Verify default behavior (no max_retries arg) continues to work with max_retries=0 on the SDK client
  • Confirm linter passes (make lint ✅)

Made with Cursor

The framework's `_main_task()` in `llm.py` retries up to `max_retry`
times (default 3) via `APIConnectOptions`. When the underlying OpenAI
SDK client also has its own retry logic enabled, each framework-level
retry triggers multiple silent SDK-level retries, causing a
multiplicative blowup (e.g. 3 × 3 = 9 actual HTTP requests instead of
the expected 3) with only the framework retries visible in logs.

- `inference/llm.py`: Add `max_retries=0` to the OpenAI client
  constructor. This was the only client in the codebase missing this
  setting, causing it to use the SDK default of 2 silent retries.

- `openai/llm.py`: Always set `max_retries=0` on the SDK client
  regardless of user input, and emit a DeprecationWarning when a
  non-zero value is passed. Users should configure retries exclusively
  through `APIConnectOptions.max_retry`.

Made-with: Cursor
@chenghao-mou chenghao-mou requested a review from a team April 3, 2026 21:37
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

metadata: NotGivenOr[dict[str, str]] = NOT_GIVEN,
max_completion_tokens: NotGivenOr[int] = NOT_GIVEN,
timeout: httpx.Timeout | None = None,
max_retries: NotGivenOr[int] = NOT_GIVEN,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default it was 0? are you sure this is the issue

)
from .utils import AsyncAzureADTokenProvider

logger = logging.getLogger(__name__)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger = logging.getLogger(__name__)

)

if is_given(max_retries) and max_retries > 0:
warnings.warn(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We deprecate using the logger in other parts of the code:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants