feat(rerank): add configurable HTTP timeout for OpenAI-compatible client by Dicoangelo · Pull Request #2784 · volcengine/OpenViking

Dicoangelo · 2026-06-23T05:28:31Z

Problem

OpenAIRerankClient hardcodes a 30-second HTTP timeout. When using local LLM servers (e.g. llama.cpp on ROCm) that require cold-start model loading, 30s is often insufficient, causing ReadTimeout errors on the first request after inactivity.

Solution

Add a configurable timeout field to RerankConfig (default 30.0, fully backwards-compatible) and thread it through OpenAIRerankClient.__init__, from_config, and the requests.post call in rerank_batch. The timeout can now be set per-environment in ov.conf:

"rerank": {
  "api_key": "...",
  "api_base": "http://localhost:8080/v1/rerank",
  "timeout": 120
}

Subsequent requests still benefit from the already-warm model; only the cold-start first call needs the longer budget.

Why a config field (not a hardcoded higher value or retries)

A higher hardcoded value isn't flexible across server setups.
Retry logic masks the root cause instead of fixing it.

Tests

Added tests/unit/models/rerank/test_openai_rerank_timeout.py (7 tests): default timeout, custom timeout, config default, from_config threading (custom + default), and rerank_batch passing the configured/default timeout to requests.post. New tests plus the existing rerank suite pass (14 passed).

Closes #2732

OpenAIRerankClient hardcoded a 30s HTTP timeout, which is insufficient for local LLM servers (e.g. llama.cpp on ROCm) that incur model cold-start latency on the first request after inactivity, causing ReadTimeout errors. Add a `timeout` field to RerankConfig (default 30.0, backwards-compatible) and thread it through OpenAIRerankClient.__init__, from_config, and the requests.post call in rerank_batch. The timeout can now be set per-environment in ov.conf, e.g. "timeout": 120. Closes volcengine#2732

github-project-automation Bot added this to OpenViking project Jun 23, 2026

github-project-automation Bot moved this to Backlog in OpenViking project Jun 23, 2026

docs: document rerank timeout config

9d5d2ba

qin-ctx approved these changes Jun 23, 2026

View reviewed changes

qin-ctx merged commit 9ec15c8 into volcengine:main Jun 23, 2026
5 of 6 checks passed

github-project-automation Bot moved this from Backlog to Done in OpenViking project Jun 23, 2026

blackdeathdrow mentioned this pull request Jun 23, 2026

feat: split rerank HTTP timeout into connect and read timeouts #2733

Closed

1 task

ZaynJarvis added scenario:kernel Core server, runtime, storage, retrieval, SDK, CLI, or Studio behavior. urgency:suggestion Feature/RFC/improvement, not immediate bug. labels Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rerank): add configurable HTTP timeout for OpenAI-compatible client#2784

feat(rerank): add configurable HTTP timeout for OpenAI-compatible client#2784
qin-ctx merged 2 commits into
volcengine:mainfrom
Dicoangelo:feat/rerank-configurable-timeout

Dicoangelo commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Dicoangelo commented Jun 23, 2026

Problem

Solution

Why a config field (not a hardcoded higher value or retries)

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants