Skip to content

feat(rerank): add configurable HTTP timeout for OpenAI-compatible client#2784

Merged
qin-ctx merged 2 commits into
volcengine:mainfrom
Dicoangelo:feat/rerank-configurable-timeout
Jun 23, 2026
Merged

feat(rerank): add configurable HTTP timeout for OpenAI-compatible client#2784
qin-ctx merged 2 commits into
volcengine:mainfrom
Dicoangelo:feat/rerank-configurable-timeout

Conversation

@Dicoangelo

Copy link
Copy Markdown
Contributor

Problem

OpenAIRerankClient hardcodes a 30-second HTTP timeout. When using local LLM servers (e.g. llama.cpp on ROCm) that require cold-start model loading, 30s is often insufficient, causing ReadTimeout errors on the first request after inactivity.

Solution

Add a configurable timeout field to RerankConfig (default 30.0, fully backwards-compatible) and thread it through OpenAIRerankClient.__init__, from_config, and the requests.post call in rerank_batch. The timeout can now be set per-environment in ov.conf:

"rerank": {
  "api_key": "...",
  "api_base": "http://localhost:8080/v1/rerank",
  "timeout": 120
}

Subsequent requests still benefit from the already-warm model; only the cold-start first call needs the longer budget.

Why a config field (not a hardcoded higher value or retries)

  • A higher hardcoded value isn't flexible across server setups.
  • Retry logic masks the root cause instead of fixing it.

Tests

Added tests/unit/models/rerank/test_openai_rerank_timeout.py (7 tests): default timeout, custom timeout, config default, from_config threading (custom + default), and rerank_batch passing the configured/default timeout to requests.post. New tests plus the existing rerank suite pass (14 passed).

Closes #2732

OpenAIRerankClient hardcoded a 30s HTTP timeout, which is insufficient for
local LLM servers (e.g. llama.cpp on ROCm) that incur model cold-start
latency on the first request after inactivity, causing ReadTimeout errors.

Add a `timeout` field to RerankConfig (default 30.0, backwards-compatible)
and thread it through OpenAIRerankClient.__init__, from_config, and the
requests.post call in rerank_batch. The timeout can now be set per-environment
in ov.conf, e.g. "timeout": 120.

Closes volcengine#2732
@qin-ctx qin-ctx merged commit 9ec15c8 into volcengine:main Jun 23, 2026
5 of 6 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project Jun 23, 2026
@ZaynJarvis ZaynJarvis added scenario:kernel Core server, runtime, storage, retrieval, SDK, CLI, or Studio behavior. urgency:suggestion Feature/RFC/improvement, not immediate bug. labels Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scenario:kernel Core server, runtime, storage, retrieval, SDK, CLI, or Studio behavior. urgency:suggestion Feature/RFC/improvement, not immediate bug.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Feature]: Configurable HTTP timeout for OpenAI-compatible rerank client

3 participants