fix: adapt misleading routing info in admin logs#1949
fix: adapt misleading routing info in admin logs#1949steebchen-bot wants to merge 3 commits intomainfrom
Conversation
WalkthroughBoth streaming and non-streaming retry paths in the chat endpoint now record the provider-specific model mapping ( ChangesRetry Logging and Provider Attribution
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Microsoft Presidio Analyzer (2.2.362)apps/gateway/src/chat/chat.tsMicrosoft Presidio Analyzer failed to scan this file Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/gateway/src/chat/chat.ts (1)
4491-4513:⚠️ Potential issue | 🟡 MinorFinish the region-aware lookup for rate-limited retry skips.
failedMapis now keyed by provider+region, but the rate-limit skip path still mutatesproviderScoresvia provider-only lookups at Line 3691-Line 3695 and Line 6727-Line 6731. If one provider has multiple regional rows, the wrong row can still be taggedrate_limited, so the admin routing card stays misleading for that path.♻️ Suggested follow-up
- const scoreEntry = routingMetadata?.providerScores.find( - (s) => s.providerId === nextProvider.providerId, - ); + const scoreEntry = routingMetadata?.providerScores.find( + (s) => + providerRetryKey(s.providerId, s.region) === + providerRetryKey(nextProvider.providerId, nextProvider.region), + );Apply the same change in both retry loops.
Also applies to: 7515-7538
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/chat/chat.ts` around lines 4491 - 4513, The rate-limit skip path mutates providerScores using provider-only lookups which mis-tags rows when a provider has multiple regions; update both retry loops that currently use provider-only keys to perform region-aware lookups using the same providerRetryKey(providerId, region) logic and the failedMap created from routingAttempts so that when tagging a score as rate_limited you locate the exact regional row (update the two places referenced in the comment to replace provider-only map/get with providerRetryKey-based map/get and set the rate_limited flag on the matched regional score).
🧹 Nitpick comments (1)
ee/admin/src/components/log-card.tsx (1)
38-40: Centralize the routing contract instead of re-declaring it here.This file now duplicates both the
RoutingMetadatashape and the provider+region key format that the gateway already owns. The local copy has already drifted (selectedProvideris optional here but required in the canonical type), so a later backend change can silently break the selected badge/sorting in admin. Please move the shared type + key helper into a common module, or derive this type from the shared contract and widen it locally withPartial<>only if backward-compatibility is required.Also applies to: 99-101
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ee/admin/src/components/log-card.tsx` around lines 38 - 40, The file declares a local RoutingMetadata interface and duplicates the provider+region key format which is already owned by the gateway (causing drift—e.g., selectedProvider is optional here but required upstream); replace the local definition by importing the canonical RoutingMetadata and any provider/region key helper from the shared/common module (or if necessary derive the local shape as Partial<RoutingMetadata> to preserve compatibility) and remove the duplicated key-format logic so the selected badge/sorting logic uses the single source of truth (look for RoutingMetadata, selectedProvider, selectionReason, and the provider+region key helper in this file to update).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ee/admin/src/components/log-card.tsx`:
- Around line 149-160: selectedRoutingAttempt currently contains provider,
model, and region but the code only uses its provider when computing
selectedRoutingProvider and selectedRoutingKey, which drops model/region and
makes the "Selected" row ambiguous; update the logic in the selected routing
computation (references: selectedRoutingAttempt, selectedRoutingProvider,
selectedRoutingKey, getRoutingEntryKey) to prefer using
selectedRoutingAttempt.provider, selectedRoutingAttempt.model, and
selectedRoutingAttempt.region when present (fall back to
routingMetadata?.selectedProvider or log.usedProvider only for missing fields),
and ensure getRoutingEntryKey is called with the provider plus the region from
selectedRoutingAttempt when available; apply the same fix to the similar block
that uses these symbols later in the file.
---
Outside diff comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 4491-4513: The rate-limit skip path mutates providerScores using
provider-only lookups which mis-tags rows when a provider has multiple regions;
update both retry loops that currently use provider-only keys to perform
region-aware lookups using the same providerRetryKey(providerId, region) logic
and the failedMap created from routingAttempts so that when tagging a score as
rate_limited you locate the exact regional row (update the two places referenced
in the comment to replace provider-only map/get with providerRetryKey-based
map/get and set the rate_limited flag on the matched regional score).
---
Nitpick comments:
In `@ee/admin/src/components/log-card.tsx`:
- Around line 38-40: The file declares a local RoutingMetadata interface and
duplicates the provider+region key format which is already owned by the gateway
(causing drift—e.g., selectedProvider is optional here but required upstream);
replace the local definition by importing the canonical RoutingMetadata and any
provider/region key helper from the shared/common module (or if necessary derive
the local shape as Partial<RoutingMetadata> to preserve compatibility) and
remove the duplicated key-format logic so the selected badge/sorting logic uses
the single source of truth (look for RoutingMetadata, selectedProvider,
selectionReason, and the provider+region key helper in this file to update).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9a0bf05d-6f35-4475-944c-8e49a49a142b
📒 Files selected for processing (2)
apps/gateway/src/chat/chat.tsee/admin/src/components/log-card.tsx
| const selectedRoutingAttempt = routingMetadata?.routing | ||
| ?.slice() | ||
| .reverse() | ||
| .find((attempt) => attempt.succeeded); | ||
| const selectedRoutingProvider = | ||
| selectedRoutingAttempt?.provider ?? | ||
| routingMetadata?.selectedProvider ?? | ||
| log.usedProvider; | ||
| const selectedRoutingKey = getRoutingEntryKey( | ||
| selectedRoutingAttempt?.provider ?? selectedRoutingProvider, | ||
| selectedRoutingAttempt?.region, | ||
| ); |
There was a problem hiding this comment.
Don't drop model/region from the selected routing entry.
selectedRoutingAttempt has the exact provider, model, and region, but Line 153 collapses that to provider-only. The new "Selected" row therefore stays ambiguous whenever the same provider is used with multiple mappings or regions.
💡 Suggested change
const selectedRoutingAttempt = routingMetadata?.routing
?.slice()
.reverse()
.find((attempt) => attempt.succeeded);
const selectedRoutingProvider =
selectedRoutingAttempt?.provider ??
routingMetadata?.selectedProvider ??
log.usedProvider;
+const selectedRoutingLabel = selectedRoutingAttempt
+ ? `${selectedRoutingAttempt.provider}/${selectedRoutingAttempt.model}${
+ selectedRoutingAttempt.region
+ ? ` (${selectedRoutingAttempt.region})`
+ : ""
+ }`
+ : selectedRoutingProvider;
const selectedRoutingKey = getRoutingEntryKey(
selectedRoutingAttempt?.provider ?? selectedRoutingProvider,
selectedRoutingAttempt?.region,
);
...
-{selectedRoutingProvider && (
+{selectedRoutingLabel && (
<div className="flex justify-between">
<span className="text-muted-foreground">Selected</span>
- <span className="font-mono">
- {selectedRoutingProvider}
- </span>
+ <span className="font-mono">{selectedRoutingLabel}</span>
</div>
)}Also applies to: 395-400
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ee/admin/src/components/log-card.tsx` around lines 149 - 160,
selectedRoutingAttempt currently contains provider, model, and region but the
code only uses its provider when computing selectedRoutingProvider and
selectedRoutingKey, which drops model/region and makes the "Selected" row
ambiguous; update the logic in the selected routing computation (references:
selectedRoutingAttempt, selectedRoutingProvider, selectedRoutingKey,
getRoutingEntryKey) to prefer using selectedRoutingAttempt.provider,
selectedRoutingAttempt.model, and selectedRoutingAttempt.region when present
(fall back to routingMetadata?.selectedProvider or log.usedProvider only for
missing fields), and ensure getRoutingEntryKey is called with the provider plus
the region from selectedRoutingAttempt when available; apply the same fix to the
similar block that uses these symbols later in the file.
- chat.ts: use buildRoutingAttempt() helper from main but preserve PR's fix of usedModelMapping over baseModelName for routing attempts - ee/admin log-card.tsx: adopt SharedLogCard refactor from main - shared log-card.tsx: port PR's selected-entry features (sorting, selected badge, "lower is better" label, selectedProvider row) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Resolved merge conflicts with origin/main. Two files had conflicts:
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/gateway/src/chat/chat.ts (1)
4726-4750:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winModel attribution is still inconsistent on same-provider retry paths.
You switched fallback attempts to
usedModelMapping, but same-provider retry branches still writebaseModelName. That can produce mixedrouting[].modelvalues for one request.Suggested patch
- buildRoutingAttempt( - usedProvider, - baseModelName, + buildRoutingAttempt( + usedProvider, + usedModelMapping, 0, getErrorType(0), false, @@ - buildRoutingAttempt( - usedProvider, - baseModelName, + buildRoutingAttempt( + usedProvider, + usedModelMapping, 0, getErrorType(0), false, @@ - buildRoutingAttempt( - usedProvider, - baseModelName, + buildRoutingAttempt( + usedProvider, + usedModelMapping, res.status, getErrorType(res.status), false, @@ - buildRoutingAttempt( - usedProvider, - baseModelName, + buildRoutingAttempt( + usedProvider, + usedModelMapping, inferredStatusCode, getErrorType(inferredStatusCode), false, @@ - buildRoutingAttempt( - usedProvider, - baseModelName, + buildRoutingAttempt( + usedProvider, + usedModelMapping, inferredStatusCode, getErrorType(inferredStatusCode), false, @@ - buildRoutingAttempt( - usedProvider, - baseModelName, + buildRoutingAttempt( + usedProvider, + usedModelMapping, 0, getErrorType(0), false, @@ - buildRoutingAttempt( - usedProvider, - baseModelName, + buildRoutingAttempt( + usedProvider, + usedModelMapping, res.status, getErrorType(res.status), false,Also applies to: 5061-5086, 5301-5326, 5555-5580, 8306-8331, 8790-8815
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/chat/chat.ts` around lines 4726 - 4750, The same-provider retry branches still record routing attempts using baseModelName, causing mixed routing.model values; update those branches to pass usedModelMapping instead of baseModelName when calling buildRoutingAttempt (the same places that already use usedModelMapping for fallbacks), e.g., in the blocks that push into routingAttempts and also call applyResolvedProviderContext/sameProviderRetryContext and decrement retryAttempt, ensure buildRoutingAttempt uses usedModelMapping so routing[].model is consistent across same-provider retries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 4726-4750: The same-provider retry branches still record routing
attempts using baseModelName, causing mixed routing.model values; update those
branches to pass usedModelMapping instead of baseModelName when calling
buildRoutingAttempt (the same places that already use usedModelMapping for
fallbacks), e.g., in the blocks that push into routingAttempts and also call
applyResolvedProviderContext/sameProviderRetryContext and decrement
retryAttempt, ensure buildRoutingAttempt uses usedModelMapping so
routing[].model is consistent across same-provider retries.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d329d534-d4cc-463b-823d-704be236810b
📒 Files selected for processing (1)
apps/gateway/src/chat/chat.ts
Summary
Validation
pnpm buildpnpm --filter admin buildpnpm --filter admin lint(passes with existing warnings in unrelated admin files)pnpm vitest run apps/gateway/src/chat/tools/retry-with-fallback.spec.ts --no-file-parallelismNotes
pnpm formatandpnpm lintstill fail repo-wide because of pre-existing worker lint errorspnpm test:unitstill fails broadly in existing API/DB/worker suites outside this changeSummary by CodeRabbit
Improvements