feat: audio support for google models with integration into playground by RATCHAW · Pull Request #2213 · theopenco/llmgateway

RATCHAW · 2026-05-08T20:24:43Z

Summary

Adds Google Gemini audio input support to the gateway, end-to-end.

Accepts input_audio blocks on /v1/chat/completions and routes them to AI Studio or Vertex with the correct provider-specific MIME.
Bills audio (including a modality-aware cached-audio rate), surfaces audio_input_cost in the response, and persists it on the log row plus hourly/daily aggregates.
model: "auto" only routes to providers that accept the requested format. Unsupported pairs (e.g. Vertex + aiff) return 400, not 500.

What's included

Request path

chat/schemas/completions.ts accepts the union of formats supported by either Google variant.
transform-google-messages.ts maps formats to provider-correct MIMEs (AI Studio: audio/aac, audio/aiff; Vertex: audio/x-aac plus Vertex-only formats). Exports googleProviderSupportsAudioFormat and UnsupportedAudioFormatError.

Routing

chat.ts + messages-contain-audio.ts detect audio and formats; filterEligibleModelProviders skips providers without audio: true and Google providers that can’t handle every requested format.
app.ts maps UnsupportedAudioFormatError to 400.

Pricing & billing

New fields on ProviderModelMapping: audio?: boolean, inputAudioPrice?: number, cachedInputAudioPrice?: number. 43 chat-capable Gemini mappings flagged audio: true; image-only and Veo entries unflagged.
extract-token-usage.ts / parse-provider-response.ts pull per-modality counts from usageMetadata.{promptTokensDetails, cacheTokensDetails}.
costs.ts partitions cached text vs cached audio and bills cached audio at cachedInputAudioPrice.

Response shape

transform-response-to-openai.ts emits audio_input_cost in cost_details. OpenAPI schema and Responses API mirrors declare it.

Persistence & analytics

New columns on log (audioInputTokens, audioInputCost) and on the four hourly stats tables (audioInputCost). Worker aggregators roll it up.

Playground

prompt-input.tsx accepts audio attachments alongside images, with a Music2Icon chip representing each audio file in the attachment row.
New audios column on the message table; chat read/write API routes (chats.ts, public-chat-shares.ts) surface it on every message payload.
DB migration for all audio-related columns (analytics + message.audios) consolidated into a single file: 1778503042_brave_tomorrow_man.sql.

Tests

Unit: MIME mapping, format helper, audio detection, cache math, model-data flag presence.
E2E: audio.e2e.ts POSTs an audio message to every audio-priced Google provider and asserts audioTokens × inputAudioPrice matches the returned cost.

Summary by CodeRabbit

New Features
- Add end-to-end audio input support in chat (many formats); audio token and cost fields now appear in responses, usage, and billing.
Bug Fixes
- Return HTTP 400 with a warning for unsupported audio format inputs with clearer error metadata.
Tests
- Added E2E and unit tests covering audio input handling, provider/format mapping, and audio pricing behavior.
Chores
- DB/schema, stats, API/activity/logs, UI, and model metadata updated to store and surface audio token/cost metrics and capability flags.

Accept `input_audio` on chat completions with provider-aware routing, modality-aware billing, and persistence into log/stats. Bad (format, provider) pairs return 400, not 500.

coderabbitai · 2026-05-08T20:24:58Z

Walkthrough

This PR implements end-to-end audio input support: message/schema additions, Google audio MIME handling and explicit UnsupportedAudioFormatError, provider audio capability/pricing metadata, audio-aware routing/filtering, audio token extraction, audio-aware cost calculation and propagation, DB/schema + analytics changes, UI/playground updates, and unit/E2E tests.

Changes

Audio Input Support

Layer / File(s)	Summary
Message Types and Schema `packages/models/src/types.ts`, `apps/gateway/src/chat/schemas/completions.ts`	Added `InputAudioContent` and `isInputAudioContent`; extended `MessageContent` and `completionsRequestSchema` to accept `input_audio { data, format }`.
Audio Detection Utilities `apps/gateway/src/chat/tools/messages-contain-audio.ts`, `.spec.ts`	Added `messagesContainAudio` and `getAudioFormatsFromMessages` helpers; tests for detection, format deduplication, and model/provider metadata assertions.
Google Audio Format Support `packages/actions/src/transform-google-messages.ts`, `.spec.ts`	Added `googleProviderSupportsAudioFormat`, provider-specific MIME maps, `UnsupportedAudioFormatError`, MIME resolution, and tests covering accepted/rejected format cases.
transformGoogleMessages & Call Sites `packages/actions/src/transform-google-messages.ts`, `packages/actions/src/prepare-request-body.ts`	Extended `transformGoogleMessages` to accept `providerId` and updated prepareRequestBody calls to forward provider info for provider-aware MIME resolution.
Provider Audio Capabilities `packages/models/src/models.ts`, `packages/models/src/models/google.ts`	Extended `ProviderModelMapping` with `inputAudioPrice`, `cachedInputAudioPrice`, and `audio` flag; populated Google model/provider entries with `audio` capability and pricing where applicable.
Chat Routing with Audio Filtering `apps/gateway/src/chat/chat.ts`	Integrated audio detection and `audioFormats` into provider eligibility across routing/fallbacks; rejects incompatible providers and surfaces audio-specific no-provider errors.
Token Extraction & Parsing `apps/gateway/src/chat/tools/extract-token-usage.ts`, `apps/gateway/src/chat/tools/parse-provider-response.ts`	Extracts and exposes `audioInputTokens` and `cachedAudioInputTokens` by summing AUDIO-modality entries in Google-style usage metadata.
Cost Calculation `apps/gateway/src/lib/costs.ts`, `.spec.ts`	Extended `calculateCosts` to accept audio token inputs, compute cached vs billable audio splits, price billable audio with `inputAudioPrice` (fallback `inputPrice`) and cached audio with `cachedInputAudioPrice`; added unit tests for cached/mixed audio/text scenarios.
Response Transformation `apps/gateway/src/chat/tools/transform-response-to-openai.ts`	Extended `CostData`, `applyExtendedUsageFields`, `buildUsageObject`, and `transformResponseToOpenai` to accept/propagate `audioInputTokens` and `audioInputCost` into usage objects and `cost_details.audio_input_cost`.
Streaming & Non-Streaming Integration `apps/gateway/src/chat/chat.ts`	Propagated audio token/cost fields through streaming and non-streaming pipelines, including usage extraction, content-filter billing, early/cancel paths, and logs.
Responses API Shapes `apps/gateway/src/responses/tools/convert-chat-to-responses.ts`, `convert-streaming-to-responses.ts`	Extended Responses API usage shapes to include optional `audio_input_cost` in `cost_details`.
API Activity & Logs `apps/api/src/routes/activity.ts`, `apps/api/src/routes/logs.ts`	Extended activity and logs routes to aggregate/select `audioInputCost` and include nullable `audioInputTokens`/`audioInputCost` in log responses; updated Zod schemas.
UI Types `apps/ui/src/types/activity.ts`	Added `audioInputCost` to `DailyActivity` and `ActivitT` types.
Analytics Aggregation `apps/worker/src/services/*-stats-aggregator.ts`	Added `audioInputCost` metric to hourly aggregation fields and AGGREGATE_KEYS so daily/global aggregates include audio costs.
Database Schema & Migrations `packages/db/src/schema.ts`, `packages/db/migrations/*`	Added `audioInputTokens` and `audioInputCost` to `log` table and `audioInputCost` (default 0) to hourly/global stats tables; migration and journal updated.
Error Handling `apps/gateway/src/app.ts`	Added global error handler for `UnsupportedAudioFormatError` to log metadata and return HTTP 400 JSON with message and metadata.
E2E Tests `apps/gateway/src/audio.e2e.ts`	Added parameterized E2E suite posting base64 WAV `input_audio` to `/v1/chat/completions`, seeding DB/API key and asserting audio token counts and `audio_input_cost`.
Playground / Chat UI `apps/playground/src/...`, `apps/playground/package.json`	Playground model/provider mapping includes `audio`; chat UI and prompt input accept, render, and attach audio files; dependency bumped.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

theopenco/llmgateway#2203: Both PRs modify the same googleModels export in packages/models/src/models/google.ts (model/provider metadata overlap).
theopenco/llmgateway#2031: Related changes touching usage/cost shaping and OpenAI-style response transformations.
theopenco/llmgateway#2173: Related to Responses API conversion and usage shape extensions.

Suggested reviewers

steebchen
smakosh

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: audio support for google models with integration into playground' accurately describes the main change—adding audio support for Google models throughout the system including the playground UI.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (4)

apps/gateway/src/chat/tools/transform-response-to-openai.ts (1)
326-565: 💤 Low value

Forward-compat: only Google/Anthropic/AWS Bedrock branches forward audioInputTokens.

Other provider branches (inference.net/together-ai/groq, alibaba, openai/azure/mistral/novita, bytedance, xai, embercloud/zai, default) still call buildUsageObject without forwarding audioInputTokens. This is fine for this PR since only Google supports audio, but when adding audio to another provider later, both the relevant buildUsageObject and applyExtendedUsageFields call sites must also be updated; otherwise prompt_tokens_details.audio_tokens will silently fall back to upstream/0 for those providers. No action required now.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts` around lines 326
- 565, The review points out that only the Google/Anthropic/AWS Bedrock branches
currently forward audioInputTokens into buildUsageObject and
applyExtendedUsageFields, so to make this forward-compatible, update any other
provider branches that construct or mutate transformedResponse (e.g., the
inference.net/together-ai/groq branch, alibaba, openai/azure/mistral/novita,
bytedance, xai, embercloud/zai, and the default case) to pass audioInputTokens
into buildUsageObject and into applyExtendedUsageFields where usage is merged;
locate these call sites by searching for buildUsageObject(...) and
applyExtendedUsageFields(...) and add the audioInputTokens parameter and
propagate it through the corresponding usage/extended-usage logic so
prompt_tokens_details.audio_tokens is preserved if those providers later support
audio.
packages/actions/src/transform-google-messages.ts (1)
92-109: 💤 Low value

resolveGoogleAudioMime error target is wrong for unrecognised providers

When providerId is not in either family the code correctly builds the union map (all GoogleAudioFormat values will always resolve), but if the dead-code error path at line 103 were somehow reached the target string would always say "Google AI Studio" regardless of the actual provider — VERTEX_FAMILY.has(providerId ?? "") is false so the ternary falls to the else branch.

Given that the fallback map contains every value in GoogleAudioFormat, this path is presently unreachable. If new formats are added to GoogleAudioFormat without updating the MIME maps the sentinel message becomes actively misleading.
🛡️ Suggested fix – use the actual provider ID as the target label
 	const target = VERTEX_FAMILY.has(providerId ?? "")
 		? "Vertex AI"
-		: "Google AI Studio";
+		: AI_STUDIO_FAMILY.has(providerId ?? "")
+			? "Google AI Studio"
+			: (providerId ?? "unknown provider");
 	throw new UnsupportedAudioFormatError(format, target);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/actions/src/transform-google-messages.ts` around lines 92 - 109, The
error message in resolveGoogleAudioMime uses a ternary that can incorrectly
label the target as "Google AI Studio" for unrecognized providerIds; update the
UnsupportedAudioFormatError call to use the actual providerId (or a clear
fallback like "unknown provider") as the target instead of the current ternary,
referencing resolveGoogleAudioMime, providerId, VERTEX_FAMILY/AI_STUDIO_FAMILY
and UnsupportedAudioFormatError so the thrown error reflects the real provider
when the dead-code path is reached.
apps/gateway/src/audio.e2e.ts (1)
170-173: 💤 Low value

Unreachable inputPrice fallback

provider.inputAudioPrice is always defined here — line 55 already filters out any provider where inputAudioPrice === undefined. The ?? provider.inputPrice ?? 0 fallback is dead code.
♻️ Proposed cleanup
-		const expected =
-			audioTokens * (provider.inputAudioPrice ?? provider.inputPrice ?? 0);
+		const expected = audioTokens * provider.inputAudioPrice;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/audio.e2e.ts` around lines 170 - 173, The test calculates
expected using a fallback chain that is unreachable because providers with
inputAudioPrice === undefined are already filtered; change the expected
calculation to use provider.inputAudioPrice directly (e.g., expected =
audioTokens * provider.inputAudioPrice), remove the "?? provider.inputPrice ??
0" fallback, and ensure audioTokens/audioInputCost assertions (the expect call)
remain unchanged; if the compiler complains about potential undefined, use a
non-null assertion or refine the provider type where filtered to guarantee
inputAudioPrice is defined.
packages/actions/src/transform-google-messages.spec.ts (1)
105-122: 💤 Low value

Consider using the .rejects pattern for consistency

The other error-assertion tests in this file use await expect(...).rejects.toBeInstanceOf(...). The manual try/catch here works but produces a confusing failure message if no error is thrown (the sentinel throw new Error("expected throw") gets caught and the assertion reports Error ≠ UnsupportedAudioFormatError rather than "expected function to throw").
♻️ Idiomatic rewrite using `.rejects`
-	it("attaches format and providerTarget on the error", async () => {
-		try {
-			await transformGoogleMessages(
-				audioMessages("aiff"),
-				false,
-				20,
-				null,
-				undefined,
-				"quartz",
-			);
-			throw new Error("expected throw");
-		} catch (err) {
-			expect(err).toBeInstanceOf(UnsupportedAudioFormatError);
-			const e = err as UnsupportedAudioFormatError;
-			expect(e.format).toBe("aiff");
-			expect(e.providerTarget).toBe("Vertex AI");
-		}
-	});
+	it("attaches format and providerTarget on the error", async () => {
+		const promise = transformGoogleMessages(
+			audioMessages("aiff"),
+			false,
+			20,
+			null,
+			undefined,
+			"quartz",
+		);
+		const err = await promise.catch((e: unknown) => e);
+		expect(err).toBeInstanceOf(UnsupportedAudioFormatError);
+		const e = err as UnsupportedAudioFormatError;
+		expect(e.format).toBe("aiff");
+		expect(e.providerTarget).toBe("Vertex AI");
+	});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/actions/src/transform-google-messages.spec.ts` around lines 105 -
122, Replace the manual try/catch sentinel pattern in the test for
transformGoogleMessages with the idiomatic Jest .rejects pattern: call
expect(transformGoogleMessages(audioMessages("aiff"), false, 20, null,
undefined, "quartz")) and use
.rejects.toBeInstanceOf(UnsupportedAudioFormatError), then assert on the
rejected error's properties (format and providerTarget) via
.rejects.toMatchObject or by awaiting the rejection value; update references to
the UnsupportedAudioFormatError and
audioMessages("aiff")/transformGoogleMessages invocations accordingly so the
test fails with a clear "expected promise to reject" message instead of catching
the sentinel throw.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/gateway/src/audio.e2e.ts`:
- Line 114: The test suite currently uses getConcurrentTestOptions() in the
describe call which enables concurrent:true; replace getConcurrentTestOptions()
with getTestOptions() so the suite does not run tests in parallel—update the
describe invocation that references getConcurrentTestOptions() to call
getTestOptions() instead (ensure any import or reference to getTestOptions() is
present and used in place of getConcurrentTestOptions()).

In `@apps/gateway/src/chat/chat.ts`:
- Around line 3759-3760: The cached-response replay paths that call
calculateCosts() are not reconstructing audio token splits, so cached Gemini
audio requests are billed as text-only; update the cache-hit replay logic (the
branches that set audioInputTokens/audioInputCost and later call
calculateCosts()) to rebuild prompt_tokens_details.audio_tokens and any per-part
split (e.g., prompt_tokens_details.audio_tokens.{input,replay} or the structure
calculateCosts() expects) from the cached metadata before invoking
calculateCosts(), ensuring costs.audioInputTokens and costs.audioInputCost are
derived from the reconstructed audio token fields; modify the replay code paths
referenced near the audioInputTokens/audioInputCost assignments (also apply the
same fix at the other occurrence noted) so cached audio requests retain their
audio token/cost data.
- Around line 1811-1823: The auto-selection path is excluding non-Claude auto
candidates for audio requests (because allowedAutoModels contains only Claude),
causing model: "auto" to fall through to the hardcoded claude/anthropic
fallback; to fix, when the original request model === "auto" and hasAudio is
true, relax the allowedAutoModels restriction so you only filter out providers
that explicitly cannot do audio (i.e., keep the provider.audio !== true check
and the audioFormats.every(...) googleProviderSupportsAudioFormat check) but
skip the allowedAutoModels membership test (or expand allowedAutoModels to
include other known audio-capable auto providers like Gemini); update the
selection logic around hasAudio, provider.audio, audioFormats,
googleProviderSupportsAudioFormat and allowedAutoModels so an audio-capable auto
candidate is preserved and the code does not drop into the claude-haiku-4-5 /
anthropic fallback incorrectly.
- Around line 2041-2042: The audio gating (hasAudio and audioFormats) is only
enforced inside the sameProviderMappings.length > 1 branch, allowing
direct/pinned providers or empty eligibleMappings to bypass validation; modify
the provider-selection logic (the code that iterates/filters eligibleMappings
and sameProviderMappings) to always filter a provider's mappings by audio
eligibility using hasAudio and audioFormats before choosing that provider, and
if a provider ends up with zero audio-eligible mappings reject that
provider/request (return an error) instead of falling back to
single-mapping/default-region logic; update the sameProviderMappings handling
and the fallback path so both respect the audio filter and use the filtered
mapping list for selection.

In `@apps/gateway/src/chat/tools/messages-contain-audio.spec.ts`:
- Around line 122-124: In the tests in messages-contain-audio.spec.ts where the
local variable model is checked (currently "if (!model) { continue }"), stop
silently skipping and assert presence instead; replace the continue with an
explicit assertion (e.g., expect(model).toBeDefined() or throw a clear error) so
the test fails if a model ID is missing—apply this change for each occurrence
where model is guarded (the three spots shown) to surface catalog regressions.

In `@packages/models/src/models/google.ts`:
- Line 1822: The Groq mapping for model "gemma2-9b-it" incorrectly sets audio:
true which routes audio input to a text-only model; locate the mapping entry
that references "gemma2-9b-it" in packages/models/src/models/google.ts and
remove the audio: true flag (or explicitly set audio: false/remove audio-related
content types) so this provider/model pair only accepts text inputs and won't be
selected for audio/input_audio chat completions.

---

Nitpick comments:
In `@apps/gateway/src/audio.e2e.ts`:
- Around line 170-173: The test calculates expected using a fallback chain that
is unreachable because providers with inputAudioPrice === undefined are already
filtered; change the expected calculation to use provider.inputAudioPrice
directly (e.g., expected = audioTokens * provider.inputAudioPrice), remove the
"?? provider.inputPrice ?? 0" fallback, and ensure audioTokens/audioInputCost
assertions (the expect call) remain unchanged; if the compiler complains about
potential undefined, use a non-null assertion or refine the provider type where
filtered to guarantee inputAudioPrice is defined.

In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts`:
- Around line 326-565: The review points out that only the Google/Anthropic/AWS
Bedrock branches currently forward audioInputTokens into buildUsageObject and
applyExtendedUsageFields, so to make this forward-compatible, update any other
provider branches that construct or mutate transformedResponse (e.g., the
inference.net/together-ai/groq branch, alibaba, openai/azure/mistral/novita,
bytedance, xai, embercloud/zai, and the default case) to pass audioInputTokens
into buildUsageObject and into applyExtendedUsageFields where usage is merged;
locate these call sites by searching for buildUsageObject(...) and
applyExtendedUsageFields(...) and add the audioInputTokens parameter and
propagate it through the corresponding usage/extended-usage logic so
prompt_tokens_details.audio_tokens is preserved if those providers later support
audio.

In `@packages/actions/src/transform-google-messages.spec.ts`:
- Around line 105-122: Replace the manual try/catch sentinel pattern in the test
for transformGoogleMessages with the idiomatic Jest .rejects pattern: call
expect(transformGoogleMessages(audioMessages("aiff"), false, 20, null,
undefined, "quartz")) and use
.rejects.toBeInstanceOf(UnsupportedAudioFormatError), then assert on the
rejected error's properties (format and providerTarget) via
.rejects.toMatchObject or by awaiting the rejection value; update references to
the UnsupportedAudioFormatError and
audioMessages("aiff")/transformGoogleMessages invocations accordingly so the
test fails with a clear "expected promise to reject" message instead of catching
the sentinel throw.

In `@packages/actions/src/transform-google-messages.ts`:
- Around line 92-109: The error message in resolveGoogleAudioMime uses a ternary
that can incorrectly label the target as "Google AI Studio" for unrecognized
providerIds; update the UnsupportedAudioFormatError call to use the actual
providerId (or a clear fallback like "unknown provider") as the target instead
of the current ternary, referencing resolveGoogleAudioMime, providerId,
VERTEX_FAMILY/AI_STUDIO_FAMILY and UnsupportedAudioFormatError so the thrown
error reflects the real provider when the dead-code path is reached.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cbe06e32-bab5-4ab9-a048-4f05afbba3c7

📥 Commits

Reviewing files that changed from the base of the PR and between 1dfac9c and 2dc042a.

⛔ Files ignored due to path filters (1)

apps/gateway/src/test-fixtures/test-audio.wav is excluded by !**/*.wav

📒 Files selected for processing (25)

apps/gateway/src/app.ts
apps/gateway/src/audio.e2e.ts
apps/gateway/src/chat/chat.ts
apps/gateway/src/chat/schemas/completions.ts
apps/gateway/src/chat/tools/extract-token-usage.ts
apps/gateway/src/chat/tools/messages-contain-audio.spec.ts
apps/gateway/src/chat/tools/messages-contain-audio.ts
apps/gateway/src/chat/tools/parse-provider-response.ts
apps/gateway/src/chat/tools/transform-response-to-openai.ts
apps/gateway/src/lib/costs.spec.ts
apps/gateway/src/lib/costs.ts
apps/gateway/src/responses/tools/convert-chat-to-responses.ts
apps/gateway/src/responses/tools/convert-streaming-to-responses.ts
apps/worker/src/services/global-stats-aggregator.ts
apps/worker/src/services/project-stats-aggregator.ts
packages/actions/src/prepare-request-body.ts
packages/actions/src/transform-google-messages.spec.ts
packages/actions/src/transform-google-messages.ts
packages/db/migrations/1778260730_nervous_karnak.sql
packages/db/migrations/meta/1778260730_snapshot.json
packages/db/migrations/meta/_journal.json
packages/db/src/schema.ts
packages/models/src/models.ts
packages/models/src/models/google.ts
packages/models/src/types.ts

coderabbitai · 2026-05-08T20:31:44Z

+		if (!model) {
+			continue;
+		}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t silently skip missing model IDs in these assertions.

Lines 122, 148, and 168 currently continue on missing models, which can hide catalog regressions and still pass the suite. Assert model existence instead.

Suggested fix

- if (!model) { - continue; - } + expect(model, `Missing model definition for ${id}`).toBeDefined(); + if (!model) { + throw new Error(`Missing model definition for ${id}`); + }

Also applies to: 148-150, 168-170

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/gateway/src/chat/tools/messages-contain-audio.spec.ts` around lines 122 - 124, In the tests in messages-contain-audio.spec.ts where the local variable model is checked (currently "if (!model) { continue }"), stop silently skipping and assert presence instead; replace the continue with an explicit assertion (e.g., expect(model).toBeDefined() or throw a clear error) so the test fails if a model ID is missing—apply this change for each occurrence where model is guarded (the three spots shown) to surface catalog regressions.

- When the request includes input_audio, only consider provider mappings with audio: true and a supported audio format; return 400 if none qualify. - Let non-allowlisted models participate in auto-routing when they support audio. - Record audio_tokens for streaming and cached responses. - Run the audio e2e suite sequentially. - Remove the incorrect audio: true flag on Groq gemma2-9b-it.

- Introduced `audioInputCost` and `audioInputTokens` fields in the activity and logs schemas. - Updated related calculations and API responses to include audio input costs. - Ensured consistency across various API interfaces for audio input handling.

- Upgraded `@llmgateway/ai-sdk-provider` to version 3.8.0, introducing new audio capabilities. - Added `audio` field to model provider mapping schema and updated related components to handle audio attachments. - Enhanced chat UI to support audio messages, including playback controls for audio files. - Updated internal models and types to reflect audio capabilities across the application. This update improves the overall functionality and user experience for audio input handling.

…eat/google-audio-input

coderabbitai

🧹 Nitpick comments (1)

apps/playground/src/components/playground/chat-ui.tsx (1)
798-809: ⚡ Quick win

Consider setting accept="" when neither images nor audio are supported.

When supportsImages and supportsAudio are both false, the accept prop is undefined, which causes matchesAccept in prompt-input.tsx to return true for all file types. Users could paste files, see them appear in the attachments list, but those files won't be sent with the message (lines 597-608 only attach files when the model supports them).

Consider setting accept="" to explicitly reject all files when neither capability is supported, which would trigger the "No files match the accepted types" error and provide clearer feedback.
📋 Suggested improvement
 accept={
 	supportsImages && supportsAudio
 		? "image/*,audio/*"
 		: supportsImages
 			? "image/*"
 			: supportsAudio
 				? "audio/*"
-				: undefined
+				: ""
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/playground/src/components/playground/chat-ui.tsx` around lines 798 -
809, The accept prop currently becomes undefined when supportsImages and
supportsAudio are both false, allowing matchesAccept to accept any file; update
the accept expression in the JSX (the prop named accept in chat-ui.tsx where key
uses supportsImages/supportsAudio) to return an empty string ("") instead of
undefined when neither supportsImages nor supportsAudio is true so files are
explicitly rejected; keep multiple and globalDrop as-is (or also set globalDrop
to false if you want to prevent drops) and ensure prompt-input.tsx's
matchesAccept will then correctly show the "No files match the accepted types"
error.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/playground/src/components/playground/chat-ui.tsx`:
- Around line 798-809: The accept prop currently becomes undefined when
supportsImages and supportsAudio are both false, allowing matchesAccept to
accept any file; update the accept expression in the JSX (the prop named accept
in chat-ui.tsx where key uses supportsImages/supportsAudio) to return an empty
string ("") instead of undefined when neither supportsImages nor supportsAudio
is true so files are explicitly rejected; keep multiple and globalDrop as-is (or
also set globalDrop to false if you want to prevent drops) and ensure
prompt-input.tsx's matchesAccept will then correctly show the "No files match
the accepted types" error.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 76f38cf5-034d-413c-8706-a53e666454fa

📥 Commits

Reviewing files that changed from the base of the PR and between 7eafe27 and 1b7756b.

⛔ Files ignored due to path filters (5)

apps/code/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
apps/playground/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
apps/ui/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
ee/admin/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (8)

apps/api/src/routes/internal-models.ts
apps/playground/package.json
apps/playground/src/components/ai-elements/prompt-input.tsx
apps/playground/src/components/playground/chat-page-client.tsx
apps/playground/src/components/playground/chat-ui.tsx
apps/playground/src/lib/fetch-models.ts
apps/playground/src/lib/mapmodels.ts
apps/playground/src/lib/types.ts

✅ Files skipped from review due to trivial changes (2)

apps/playground/package.json
apps/playground/src/lib/types.ts

- Introduced `audios` field in message schemas across various routes to handle audio attachments. - Updated validation schemas to ensure audio content can be included in messages. - Enhanced chat UI and related components to support audio playback and management. - Added database migration to accommodate the new `audios` column in the message table. This update improves the functionality of chat messages by allowing users to send and receive audio files, enhancing the overall user experience.

- Introduced a new UI component for audio attachments in the PromptInput, featuring a Music2Icon for visual representation. - Updated the attachment handling logic to support audio type, enhancing user interaction with audio files. - Removed the previous audio playback element to streamline the attachment display. This update improves the user experience by providing a clearer interface for managing audio attachments.

RATCHAW · 2026-05-11T12:47:41Z

@coderabbitai resume

coderabbitai · 2026-05-11T12:47:48Z

✅ Actions performed

Reviews resumed.

feat: support Google Gemini audio input

2dc042a

Accept `input_audio` on chat completions with provider-aware routing, modality-aware billing, and persistence into log/stats. Bad (format, provider) pairs return 400, not 500.

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

RATCHAW marked this pull request as draft May 9, 2026 05:41

RATCHAW and others added 2 commits May 9, 2026 12:29

Merge branch 'main' into feat/google-audio-input

beecbd3

RATCHAW marked this pull request as ready for review May 9, 2026 11:42

RATCHAW and others added 3 commits May 9, 2026 13:13

fix(ui): add audioInputCost to activity types

7eafe27

Merge branch 'main' into feat/google-audio-input

916cf91

smakosh assigned RATCHAW May 9, 2026

RATCHAW added 2 commits May 11, 2026 12:30

Merge branch 'main' of https://github.com/theopenco/llmgateway into f…

1a01806

…eat/google-audio-input

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

RATCHAW and others added 5 commits May 11, 2026 13:12

chore(api): regenerate typed clients with audios field

e29b066

chore(db): squash audio columns into single migration

ad9de7d

Merge branch 'main' into feat/google-audio-input

fada058

RATCHAW changed the title ~~feat: support Google Gemini audio input~~ feat: audio support for google models with integration into playground May 11, 2026

smakosh approved these changes May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: audio support for google models with integration into playground#2213

feat: audio support for google models with integration into playground#2213
RATCHAW wants to merge 13 commits into
theopenco:mainfrom
RATCHAW:feat/google-audio-input

RATCHAW commented May 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

RATCHAW commented May 11, 2026

Uh oh!

coderabbitai Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RATCHAW commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Request path

Routing

Pricing & billing

Response shape

Persistence & analytics

Playground

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

RATCHAW commented May 11, 2026

Uh oh!

coderabbitai Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RATCHAW commented May 8, 2026 •

edited

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading