Skip to content

fix: adapt misleading routing info in admin logs#1949

Closed
steebchen-bot wants to merge 3 commits intomainfrom
feat-continue-current-task
Closed

fix: adapt misleading routing info in admin logs#1949
steebchen-bot wants to merge 3 commits intomainfrom
feat-continue-current-task

Conversation

@steebchen-bot
Copy link
Copy Markdown
Collaborator

@steebchen-bot steebchen-bot commented Apr 1, 2026

Summary

  • log retry/success attempts with the exact provider model mapping instead of the canonical model id
  • match failed routing score rows by provider+region so regional failures are annotated correctly
  • clarify the admin routing card by surfacing the selected routing entry and making score ordering/selection easier to read

Validation

  • pnpm build
  • pnpm --filter admin build
  • pnpm --filter admin lint (passes with existing warnings in unrelated admin files)
  • pnpm vitest run apps/gateway/src/chat/tools/retry-with-fallback.spec.ts --no-file-parallelism

Notes

  • pnpm format and pnpm lint still fail repo-wide because of pre-existing worker lint errors
  • pnpm test:unit still fails broadly in existing API/DB/worker suites outside this change

Summary by CodeRabbit

Improvements

  • Enhanced retry and fallback routing logic to more accurately track model information across request attempts
  • Improved failure attribution by refining how provider and region information are matched against failure data during error handling

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 1, 2026

Review Change Stack

Walkthrough

Both streaming and non-streaming retry paths in the chat endpoint now record the provider-specific model mapping (usedModelMapping) instead of the base model name in routing attempt logs. Provider failure enrichment has been updated to key failed attempts by provider identity and region together, instead of by provider alone, enabling region-aware failure attribution.

Changes

Retry Logging and Provider Attribution

Layer / File(s) Summary
Streaming Retry Logging
apps/gateway/src/chat/chat.ts
Timeout, network, and HTTP error retry paths record usedModelMapping instead of baseModelName in routing attempts (lines 4749–4752, 5084–5088, 5325–5328, 5624–5627); provider score enrichment now keys failures by providerRetryKey(providerId, region) (lines 5640–5651).
Non-streaming Retry Logging
apps/gateway/src/chat/chat.ts
Fetch, network, and HTTP error retry paths record usedModelMapping instead of baseModelName in routing attempts (lines 8330–8334, 8814–8818, 8901–8904); provider score enrichment now keys failures by providerRetryKey(providerId, region) (lines 8921–8928).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • theopenco/llmgateway#2033: Fixes setting usedRegion in low-uptime fallback logic, complementing region-aware retry metadata in this PR.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main objective of fixing misleading routing information in admin logs by updating model attribution and provider matching logic.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat-continue-current-task

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Microsoft Presidio Analyzer (2.2.362)
apps/gateway/src/chat/chat.ts

Microsoft Presidio Analyzer failed to scan this file


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/gateway/src/chat/chat.ts (1)

4491-4513: ⚠️ Potential issue | 🟡 Minor

Finish the region-aware lookup for rate-limited retry skips.

failedMap is now keyed by provider+region, but the rate-limit skip path still mutates providerScores via provider-only lookups at Line 3691-Line 3695 and Line 6727-Line 6731. If one provider has multiple regional rows, the wrong row can still be tagged rate_limited, so the admin routing card stays misleading for that path.

♻️ Suggested follow-up
- const scoreEntry = routingMetadata?.providerScores.find(
- 	(s) => s.providerId === nextProvider.providerId,
- );
+ const scoreEntry = routingMetadata?.providerScores.find(
+ 	(s) =>
+ 		providerRetryKey(s.providerId, s.region) ===
+ 		providerRetryKey(nextProvider.providerId, nextProvider.region),
+ );

Apply the same change in both retry loops.

Also applies to: 7515-7538

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/chat.ts` around lines 4491 - 4513, The rate-limit skip
path mutates providerScores using provider-only lookups which mis-tags rows when
a provider has multiple regions; update both retry loops that currently use
provider-only keys to perform region-aware lookups using the same
providerRetryKey(providerId, region) logic and the failedMap created from
routingAttempts so that when tagging a score as rate_limited you locate the
exact regional row (update the two places referenced in the comment to replace
provider-only map/get with providerRetryKey-based map/get and set the
rate_limited flag on the matched regional score).
🧹 Nitpick comments (1)
ee/admin/src/components/log-card.tsx (1)

38-40: Centralize the routing contract instead of re-declaring it here.

This file now duplicates both the RoutingMetadata shape and the provider+region key format that the gateway already owns. The local copy has already drifted (selectedProvider is optional here but required in the canonical type), so a later backend change can silently break the selected badge/sorting in admin. Please move the shared type + key helper into a common module, or derive this type from the shared contract and widen it locally with Partial<> only if backward-compatibility is required.

Also applies to: 99-101

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ee/admin/src/components/log-card.tsx` around lines 38 - 40, The file declares
a local RoutingMetadata interface and duplicates the provider+region key format
which is already owned by the gateway (causing drift—e.g., selectedProvider is
optional here but required upstream); replace the local definition by importing
the canonical RoutingMetadata and any provider/region key helper from the
shared/common module (or if necessary derive the local shape as
Partial<RoutingMetadata> to preserve compatibility) and remove the duplicated
key-format logic so the selected badge/sorting logic uses the single source of
truth (look for RoutingMetadata, selectedProvider, selectionReason, and the
provider+region key helper in this file to update).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ee/admin/src/components/log-card.tsx`:
- Around line 149-160: selectedRoutingAttempt currently contains provider,
model, and region but the code only uses its provider when computing
selectedRoutingProvider and selectedRoutingKey, which drops model/region and
makes the "Selected" row ambiguous; update the logic in the selected routing
computation (references: selectedRoutingAttempt, selectedRoutingProvider,
selectedRoutingKey, getRoutingEntryKey) to prefer using
selectedRoutingAttempt.provider, selectedRoutingAttempt.model, and
selectedRoutingAttempt.region when present (fall back to
routingMetadata?.selectedProvider or log.usedProvider only for missing fields),
and ensure getRoutingEntryKey is called with the provider plus the region from
selectedRoutingAttempt when available; apply the same fix to the similar block
that uses these symbols later in the file.

---

Outside diff comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 4491-4513: The rate-limit skip path mutates providerScores using
provider-only lookups which mis-tags rows when a provider has multiple regions;
update both retry loops that currently use provider-only keys to perform
region-aware lookups using the same providerRetryKey(providerId, region) logic
and the failedMap created from routingAttempts so that when tagging a score as
rate_limited you locate the exact regional row (update the two places referenced
in the comment to replace provider-only map/get with providerRetryKey-based
map/get and set the rate_limited flag on the matched regional score).

---

Nitpick comments:
In `@ee/admin/src/components/log-card.tsx`:
- Around line 38-40: The file declares a local RoutingMetadata interface and
duplicates the provider+region key format which is already owned by the gateway
(causing drift—e.g., selectedProvider is optional here but required upstream);
replace the local definition by importing the canonical RoutingMetadata and any
provider/region key helper from the shared/common module (or if necessary derive
the local shape as Partial<RoutingMetadata> to preserve compatibility) and
remove the duplicated key-format logic so the selected badge/sorting logic uses
the single source of truth (look for RoutingMetadata, selectedProvider,
selectionReason, and the provider+region key helper in this file to update).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9a0bf05d-6f35-4475-944c-8e49a49a142b

📥 Commits

Reviewing files that changed from the base of the PR and between da4dcec and a20aff1.

📒 Files selected for processing (2)
  • apps/gateway/src/chat/chat.ts
  • ee/admin/src/components/log-card.tsx

Comment thread ee/admin/src/components/log-card.tsx Outdated
Comment on lines +149 to +160
const selectedRoutingAttempt = routingMetadata?.routing
?.slice()
.reverse()
.find((attempt) => attempt.succeeded);
const selectedRoutingProvider =
selectedRoutingAttempt?.provider ??
routingMetadata?.selectedProvider ??
log.usedProvider;
const selectedRoutingKey = getRoutingEntryKey(
selectedRoutingAttempt?.provider ?? selectedRoutingProvider,
selectedRoutingAttempt?.region,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't drop model/region from the selected routing entry.

selectedRoutingAttempt has the exact provider, model, and region, but Line 153 collapses that to provider-only. The new "Selected" row therefore stays ambiguous whenever the same provider is used with multiple mappings or regions.

💡 Suggested change
 const selectedRoutingAttempt = routingMetadata?.routing
 	?.slice()
 	.reverse()
 	.find((attempt) => attempt.succeeded);
 const selectedRoutingProvider =
 	selectedRoutingAttempt?.provider ??
 	routingMetadata?.selectedProvider ??
 	log.usedProvider;
+const selectedRoutingLabel = selectedRoutingAttempt
+	? `${selectedRoutingAttempt.provider}/${selectedRoutingAttempt.model}${
+			selectedRoutingAttempt.region
+				? ` (${selectedRoutingAttempt.region})`
+				: ""
+		}`
+	: selectedRoutingProvider;
 const selectedRoutingKey = getRoutingEntryKey(
 	selectedRoutingAttempt?.provider ?? selectedRoutingProvider,
 	selectedRoutingAttempt?.region,
 );
 ...
-{selectedRoutingProvider && (
+{selectedRoutingLabel && (
 	<div className="flex justify-between">
 		<span className="text-muted-foreground">Selected</span>
-		<span className="font-mono">
-			{selectedRoutingProvider}
-		</span>
+		<span className="font-mono">{selectedRoutingLabel}</span>
 	</div>
 )}

Also applies to: 395-400

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ee/admin/src/components/log-card.tsx` around lines 149 - 160,
selectedRoutingAttempt currently contains provider, model, and region but the
code only uses its provider when computing selectedRoutingProvider and
selectedRoutingKey, which drops model/region and makes the "Selected" row
ambiguous; update the logic in the selected routing computation (references:
selectedRoutingAttempt, selectedRoutingProvider, selectedRoutingKey,
getRoutingEntryKey) to prefer using selectedRoutingAttempt.provider,
selectedRoutingAttempt.model, and selectedRoutingAttempt.region when present
(fall back to routingMetadata?.selectedProvider or log.usedProvider only for
missing fields), and ensure getRoutingEntryKey is called with the provider plus
the region from selectedRoutingAttempt when available; apply the same fix to the
similar block that uses these symbols later in the file.

@steebchen steebchen changed the title Fix misleading routing info in admin logs fix: adapt misleading routing info in admin logs Apr 2, 2026
- chat.ts: use buildRoutingAttempt() helper from main but preserve
  PR's fix of usedModelMapping over baseModelName for routing attempts
- ee/admin log-card.tsx: adopt SharedLogCard refactor from main
- shared log-card.tsx: port PR's selected-entry features (sorting,
  selected badge, "lower is better" label, selectedProvider row)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

Resolved merge conflicts with origin/main.

Two files had conflicts:

  1. apps/gateway/src/chat/chat.ts (7 conflicts): main had refactored all routingAttempts.push({...}) inline objects to use a buildRoutingAttempt() helper function, while this PR had changed model: baseModelNamemodel: usedModelMapping in those same locations. Resolved by using the buildRoutingAttempt() helper from main but passing usedModelMapping to preserve the PR's fix.

  2. ee/admin/src/components/log-card.tsx (1 conflict): main moved the entire LogCard implementation to a shared SharedLogCard component, while this PR had added selected-entry features (sorted provider scores, selected badge, 'lower is better' label). Resolved by accepting main's SharedLogCard refactor and porting the PR's features into packages/shared/src/components/log-card.tsx so they're available everywhere.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/gateway/src/chat/chat.ts (1)

4726-4750: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Model attribution is still inconsistent on same-provider retry paths.

You switched fallback attempts to usedModelMapping, but same-provider retry branches still write baseModelName. That can produce mixed routing[].model values for one request.

Suggested patch
-								buildRoutingAttempt(
-									usedProvider,
-									baseModelName,
+								buildRoutingAttempt(
+									usedProvider,
+									usedModelMapping,
 									0,
 									getErrorType(0),
 									false,
@@
-								buildRoutingAttempt(
-									usedProvider,
-									baseModelName,
+								buildRoutingAttempt(
+									usedProvider,
+									usedModelMapping,
 									0,
 									getErrorType(0),
 									false,
@@
-								buildRoutingAttempt(
-									usedProvider,
-									baseModelName,
+								buildRoutingAttempt(
+									usedProvider,
+									usedModelMapping,
 									res.status,
 									getErrorType(res.status),
 									false,
@@
-								buildRoutingAttempt(
-									usedProvider,
-									baseModelName,
+								buildRoutingAttempt(
+									usedProvider,
+									usedModelMapping,
 									inferredStatusCode,
 									getErrorType(inferredStatusCode),
 									false,
@@
-								buildRoutingAttempt(
-									usedProvider,
-									baseModelName,
+								buildRoutingAttempt(
+									usedProvider,
+									usedModelMapping,
 									inferredStatusCode,
 									getErrorType(inferredStatusCode),
 									false,
@@
-					buildRoutingAttempt(
-						usedProvider,
-						baseModelName,
+					buildRoutingAttempt(
+						usedProvider,
+						usedModelMapping,
 						0,
 						getErrorType(0),
 						false,
@@
-					buildRoutingAttempt(
-						usedProvider,
-						baseModelName,
+					buildRoutingAttempt(
+						usedProvider,
+						usedModelMapping,
 						res.status,
 						getErrorType(res.status),
 						false,

Also applies to: 5061-5086, 5301-5326, 5555-5580, 8306-8331, 8790-8815

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/chat.ts` around lines 4726 - 4750, The same-provider
retry branches still record routing attempts using baseModelName, causing mixed
routing.model values; update those branches to pass usedModelMapping instead of
baseModelName when calling buildRoutingAttempt (the same places that already use
usedModelMapping for fallbacks), e.g., in the blocks that push into
routingAttempts and also call
applyResolvedProviderContext/sameProviderRetryContext and decrement
retryAttempt, ensure buildRoutingAttempt uses usedModelMapping so
routing[].model is consistent across same-provider retries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 4726-4750: The same-provider retry branches still record routing
attempts using baseModelName, causing mixed routing.model values; update those
branches to pass usedModelMapping instead of baseModelName when calling
buildRoutingAttempt (the same places that already use usedModelMapping for
fallbacks), e.g., in the blocks that push into routingAttempts and also call
applyResolvedProviderContext/sameProviderRetryContext and decrement
retryAttempt, ensure buildRoutingAttempt uses usedModelMapping so
routing[].model is consistent across same-provider retries.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d329d534-d4cc-463b-823d-704be236810b

📥 Commits

Reviewing files that changed from the base of the PR and between a20aff1 and 73b3824.

📒 Files selected for processing (1)
  • apps/gateway/src/chat/chat.ts

@steebchen steebchen closed this May 7, 2026
@steebchen steebchen deleted the feat-continue-current-task branch May 7, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants