Skip to content

feat: audio support for google models with integration into playground#2213

Open
RATCHAW wants to merge 13 commits into
theopenco:mainfrom
RATCHAW:feat/google-audio-input
Open

feat: audio support for google models with integration into playground#2213
RATCHAW wants to merge 13 commits into
theopenco:mainfrom
RATCHAW:feat/google-audio-input

Conversation

@RATCHAW
Copy link
Copy Markdown
Contributor

@RATCHAW RATCHAW commented May 8, 2026

Summary

Adds Google Gemini audio input support to the gateway, end-to-end.

  • Accepts input_audio blocks on /v1/chat/completions and routes them to AI Studio or Vertex with the correct provider-specific MIME.
  • Bills audio (including a modality-aware cached-audio rate), surfaces audio_input_cost in the response, and persists it on the log row plus hourly/daily aggregates.
  • model: "auto" only routes to providers that accept the requested format. Unsupported pairs (e.g. Vertex + aiff) return 400, not 500.

What's included

Request path

  • chat/schemas/completions.ts accepts the union of formats supported by either Google variant.
  • transform-google-messages.ts maps formats to provider-correct MIMEs (AI Studio: audio/aac, audio/aiff; Vertex: audio/x-aac plus Vertex-only formats). Exports googleProviderSupportsAudioFormat and UnsupportedAudioFormatError.

Routing

  • chat.ts + messages-contain-audio.ts detect audio and formats; filterEligibleModelProviders skips providers without audio: true and Google providers that can’t handle every requested format.
  • app.ts maps UnsupportedAudioFormatError to 400.

Pricing & billing

  • New fields on ProviderModelMapping: audio?: boolean, inputAudioPrice?: number, cachedInputAudioPrice?: number. 43 chat-capable Gemini mappings flagged audio: true; image-only and Veo entries unflagged.
  • extract-token-usage.ts / parse-provider-response.ts pull per-modality counts from usageMetadata.{promptTokensDetails, cacheTokensDetails}.
  • costs.ts partitions cached text vs cached audio and bills cached audio at cachedInputAudioPrice.

Response shape

  • transform-response-to-openai.ts emits audio_input_cost in cost_details. OpenAPI schema and Responses API mirrors declare it.

Persistence & analytics

  • New columns on log (audioInputTokens, audioInputCost) and on the four hourly stats tables (audioInputCost). Worker aggregators roll it up.

Playground

  • prompt-input.tsx accepts audio attachments alongside images, with a Music2Icon chip representing each audio file in the attachment row.
  • New audios column on the message table; chat read/write API routes (chats.ts, public-chat-shares.ts) surface it on every message payload.
  • DB migration for all audio-related columns (analytics + message.audios) consolidated into a single file: 1778503042_brave_tomorrow_man.sql.

Tests

  • Unit: MIME mapping, format helper, audio detection, cache math, model-data flag presence.
  • E2E: audio.e2e.ts POSTs an audio message to every audio-priced Google provider and asserts audioTokens × inputAudioPrice matches the returned cost.

Summary by CodeRabbit

  • New Features

    • Add end-to-end audio input support in chat (many formats); audio token and cost fields now appear in responses, usage, and billing.
  • Bug Fixes

    • Return HTTP 400 with a warning for unsupported audio format inputs with clearer error metadata.
  • Tests

    • Added E2E and unit tests covering audio input handling, provider/format mapping, and audio pricing behavior.
  • Chores

    • DB/schema, stats, API/activity/logs, UI, and model metadata updated to store and surface audio token/cost metrics and capability flags.

Accept `input_audio` on chat completions with provider-aware routing,
modality-aware billing, and persistence into log/stats. Bad (format, provider) pairs return 400, not 500.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Walkthrough

This PR implements end-to-end audio input support: message/schema additions, Google audio MIME handling and explicit UnsupportedAudioFormatError, provider audio capability/pricing metadata, audio-aware routing/filtering, audio token extraction, audio-aware cost calculation and propagation, DB/schema + analytics changes, UI/playground updates, and unit/E2E tests.

Changes

Audio Input Support

Layer / File(s) Summary
Message Types and Schema
packages/models/src/types.ts, apps/gateway/src/chat/schemas/completions.ts
Added InputAudioContent and isInputAudioContent; extended MessageContent and completionsRequestSchema to accept input_audio { data, format }.
Audio Detection Utilities
apps/gateway/src/chat/tools/messages-contain-audio.ts, .spec.ts
Added messagesContainAudio and getAudioFormatsFromMessages helpers; tests for detection, format deduplication, and model/provider metadata assertions.
Google Audio Format Support
packages/actions/src/transform-google-messages.ts, .spec.ts
Added googleProviderSupportsAudioFormat, provider-specific MIME maps, UnsupportedAudioFormatError, MIME resolution, and tests covering accepted/rejected format cases.
transformGoogleMessages & Call Sites
packages/actions/src/transform-google-messages.ts, packages/actions/src/prepare-request-body.ts
Extended transformGoogleMessages to accept providerId and updated prepareRequestBody calls to forward provider info for provider-aware MIME resolution.
Provider Audio Capabilities
packages/models/src/models.ts, packages/models/src/models/google.ts
Extended ProviderModelMapping with inputAudioPrice, cachedInputAudioPrice, and audio flag; populated Google model/provider entries with audio capability and pricing where applicable.
Chat Routing with Audio Filtering
apps/gateway/src/chat/chat.ts
Integrated audio detection and audioFormats into provider eligibility across routing/fallbacks; rejects incompatible providers and surfaces audio-specific no-provider errors.
Token Extraction & Parsing
apps/gateway/src/chat/tools/extract-token-usage.ts, apps/gateway/src/chat/tools/parse-provider-response.ts
Extracts and exposes audioInputTokens and cachedAudioInputTokens by summing AUDIO-modality entries in Google-style usage metadata.
Cost Calculation
apps/gateway/src/lib/costs.ts, .spec.ts
Extended calculateCosts to accept audio token inputs, compute cached vs billable audio splits, price billable audio with inputAudioPrice (fallback inputPrice) and cached audio with cachedInputAudioPrice; added unit tests for cached/mixed audio/text scenarios.
Response Transformation
apps/gateway/src/chat/tools/transform-response-to-openai.ts
Extended CostData, applyExtendedUsageFields, buildUsageObject, and transformResponseToOpenai to accept/propagate audioInputTokens and audioInputCost into usage objects and cost_details.audio_input_cost.
Streaming & Non-Streaming Integration
apps/gateway/src/chat/chat.ts
Propagated audio token/cost fields through streaming and non-streaming pipelines, including usage extraction, content-filter billing, early/cancel paths, and logs.
Responses API Shapes
apps/gateway/src/responses/tools/convert-chat-to-responses.ts, convert-streaming-to-responses.ts
Extended Responses API usage shapes to include optional audio_input_cost in cost_details.
API Activity & Logs
apps/api/src/routes/activity.ts, apps/api/src/routes/logs.ts
Extended activity and logs routes to aggregate/select audioInputCost and include nullable audioInputTokens/audioInputCost in log responses; updated Zod schemas.
UI Types
apps/ui/src/types/activity.ts
Added audioInputCost to DailyActivity and ActivitT types.
Analytics Aggregation
apps/worker/src/services/*-stats-aggregator.ts
Added audioInputCost metric to hourly aggregation fields and AGGREGATE_KEYS so daily/global aggregates include audio costs.
Database Schema & Migrations
packages/db/src/schema.ts, packages/db/migrations/*
Added audioInputTokens and audioInputCost to log table and audioInputCost (default 0) to hourly/global stats tables; migration and journal updated.
Error Handling
apps/gateway/src/app.ts
Added global error handler for UnsupportedAudioFormatError to log metadata and return HTTP 400 JSON with message and metadata.
E2E Tests
apps/gateway/src/audio.e2e.ts
Added parameterized E2E suite posting base64 WAV input_audio to /v1/chat/completions, seeding DB/API key and asserting audio token counts and audio_input_cost.
Playground / Chat UI
apps/playground/src/..., apps/playground/package.json
Playground model/provider mapping includes audio; chat UI and prompt input accept, render, and attach audio files; dependency bumped.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • steebchen
  • smakosh
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: audio support for google models with integration into playground' accurately describes the main change—adding audio support for Google models throughout the system including the playground UI.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
apps/gateway/src/chat/tools/transform-response-to-openai.ts (1)

326-565: 💤 Low value

Forward-compat: only Google/Anthropic/AWS Bedrock branches forward audioInputTokens.

Other provider branches (inference.net/together-ai/groq, alibaba, openai/azure/mistral/novita, bytedance, xai, embercloud/zai, default) still call buildUsageObject without forwarding audioInputTokens. This is fine for this PR since only Google supports audio, but when adding audio to another provider later, both the relevant buildUsageObject and applyExtendedUsageFields call sites must also be updated; otherwise prompt_tokens_details.audio_tokens will silently fall back to upstream/0 for those providers. No action required now.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts` around lines 326
- 565, The review points out that only the Google/Anthropic/AWS Bedrock branches
currently forward audioInputTokens into buildUsageObject and
applyExtendedUsageFields, so to make this forward-compatible, update any other
provider branches that construct or mutate transformedResponse (e.g., the
inference.net/together-ai/groq branch, alibaba, openai/azure/mistral/novita,
bytedance, xai, embercloud/zai, and the default case) to pass audioInputTokens
into buildUsageObject and into applyExtendedUsageFields where usage is merged;
locate these call sites by searching for buildUsageObject(...) and
applyExtendedUsageFields(...) and add the audioInputTokens parameter and
propagate it through the corresponding usage/extended-usage logic so
prompt_tokens_details.audio_tokens is preserved if those providers later support
audio.
packages/actions/src/transform-google-messages.ts (1)

92-109: 💤 Low value

resolveGoogleAudioMime error target is wrong for unrecognised providers

When providerId is not in either family the code correctly builds the union map (all GoogleAudioFormat values will always resolve), but if the dead-code error path at line 103 were somehow reached the target string would always say "Google AI Studio" regardless of the actual provider — VERTEX_FAMILY.has(providerId ?? "") is false so the ternary falls to the else branch.

Given that the fallback map contains every value in GoogleAudioFormat, this path is presently unreachable. If new formats are added to GoogleAudioFormat without updating the MIME maps the sentinel message becomes actively misleading.

🛡️ Suggested fix – use the actual provider ID as the target label
 	const target = VERTEX_FAMILY.has(providerId ?? "")
 		? "Vertex AI"
-		: "Google AI Studio";
+		: AI_STUDIO_FAMILY.has(providerId ?? "")
+			? "Google AI Studio"
+			: (providerId ?? "unknown provider");
 	throw new UnsupportedAudioFormatError(format, target);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/actions/src/transform-google-messages.ts` around lines 92 - 109, The
error message in resolveGoogleAudioMime uses a ternary that can incorrectly
label the target as "Google AI Studio" for unrecognized providerIds; update the
UnsupportedAudioFormatError call to use the actual providerId (or a clear
fallback like "unknown provider") as the target instead of the current ternary,
referencing resolveGoogleAudioMime, providerId, VERTEX_FAMILY/AI_STUDIO_FAMILY
and UnsupportedAudioFormatError so the thrown error reflects the real provider
when the dead-code path is reached.
apps/gateway/src/audio.e2e.ts (1)

170-173: 💤 Low value

Unreachable inputPrice fallback

provider.inputAudioPrice is always defined here — line 55 already filters out any provider where inputAudioPrice === undefined. The ?? provider.inputPrice ?? 0 fallback is dead code.

♻️ Proposed cleanup
-		const expected =
-			audioTokens * (provider.inputAudioPrice ?? provider.inputPrice ?? 0);
+		const expected = audioTokens * provider.inputAudioPrice;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/audio.e2e.ts` around lines 170 - 173, The test calculates
expected using a fallback chain that is unreachable because providers with
inputAudioPrice === undefined are already filtered; change the expected
calculation to use provider.inputAudioPrice directly (e.g., expected =
audioTokens * provider.inputAudioPrice), remove the "?? provider.inputPrice ??
0" fallback, and ensure audioTokens/audioInputCost assertions (the expect call)
remain unchanged; if the compiler complains about potential undefined, use a
non-null assertion or refine the provider type where filtered to guarantee
inputAudioPrice is defined.
packages/actions/src/transform-google-messages.spec.ts (1)

105-122: 💤 Low value

Consider using the .rejects pattern for consistency

The other error-assertion tests in this file use await expect(...).rejects.toBeInstanceOf(...). The manual try/catch here works but produces a confusing failure message if no error is thrown (the sentinel throw new Error("expected throw") gets caught and the assertion reports ErrorUnsupportedAudioFormatError rather than "expected function to throw").

♻️ Idiomatic rewrite using `.rejects`
-	it("attaches format and providerTarget on the error", async () => {
-		try {
-			await transformGoogleMessages(
-				audioMessages("aiff"),
-				false,
-				20,
-				null,
-				undefined,
-				"quartz",
-			);
-			throw new Error("expected throw");
-		} catch (err) {
-			expect(err).toBeInstanceOf(UnsupportedAudioFormatError);
-			const e = err as UnsupportedAudioFormatError;
-			expect(e.format).toBe("aiff");
-			expect(e.providerTarget).toBe("Vertex AI");
-		}
-	});
+	it("attaches format and providerTarget on the error", async () => {
+		const promise = transformGoogleMessages(
+			audioMessages("aiff"),
+			false,
+			20,
+			null,
+			undefined,
+			"quartz",
+		);
+		const err = await promise.catch((e: unknown) => e);
+		expect(err).toBeInstanceOf(UnsupportedAudioFormatError);
+		const e = err as UnsupportedAudioFormatError;
+		expect(e.format).toBe("aiff");
+		expect(e.providerTarget).toBe("Vertex AI");
+	});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/actions/src/transform-google-messages.spec.ts` around lines 105 -
122, Replace the manual try/catch sentinel pattern in the test for
transformGoogleMessages with the idiomatic Jest .rejects pattern: call
expect(transformGoogleMessages(audioMessages("aiff"), false, 20, null,
undefined, "quartz")) and use
.rejects.toBeInstanceOf(UnsupportedAudioFormatError), then assert on the
rejected error's properties (format and providerTarget) via
.rejects.toMatchObject or by awaiting the rejection value; update references to
the UnsupportedAudioFormatError and
audioMessages("aiff")/transformGoogleMessages invocations accordingly so the
test fails with a clear "expected promise to reject" message instead of catching
the sentinel throw.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/gateway/src/audio.e2e.ts`:
- Line 114: The test suite currently uses getConcurrentTestOptions() in the
describe call which enables concurrent:true; replace getConcurrentTestOptions()
with getTestOptions() so the suite does not run tests in parallel—update the
describe invocation that references getConcurrentTestOptions() to call
getTestOptions() instead (ensure any import or reference to getTestOptions() is
present and used in place of getConcurrentTestOptions()).

In `@apps/gateway/src/chat/chat.ts`:
- Around line 3759-3760: The cached-response replay paths that call
calculateCosts() are not reconstructing audio token splits, so cached Gemini
audio requests are billed as text-only; update the cache-hit replay logic (the
branches that set audioInputTokens/audioInputCost and later call
calculateCosts()) to rebuild prompt_tokens_details.audio_tokens and any per-part
split (e.g., prompt_tokens_details.audio_tokens.{input,replay} or the structure
calculateCosts() expects) from the cached metadata before invoking
calculateCosts(), ensuring costs.audioInputTokens and costs.audioInputCost are
derived from the reconstructed audio token fields; modify the replay code paths
referenced near the audioInputTokens/audioInputCost assignments (also apply the
same fix at the other occurrence noted) so cached audio requests retain their
audio token/cost data.
- Around line 1811-1823: The auto-selection path is excluding non-Claude auto
candidates for audio requests (because allowedAutoModels contains only Claude),
causing model: "auto" to fall through to the hardcoded claude/anthropic
fallback; to fix, when the original request model === "auto" and hasAudio is
true, relax the allowedAutoModels restriction so you only filter out providers
that explicitly cannot do audio (i.e., keep the provider.audio !== true check
and the audioFormats.every(...) googleProviderSupportsAudioFormat check) but
skip the allowedAutoModels membership test (or expand allowedAutoModels to
include other known audio-capable auto providers like Gemini); update the
selection logic around hasAudio, provider.audio, audioFormats,
googleProviderSupportsAudioFormat and allowedAutoModels so an audio-capable auto
candidate is preserved and the code does not drop into the claude-haiku-4-5 /
anthropic fallback incorrectly.
- Around line 2041-2042: The audio gating (hasAudio and audioFormats) is only
enforced inside the sameProviderMappings.length > 1 branch, allowing
direct/pinned providers or empty eligibleMappings to bypass validation; modify
the provider-selection logic (the code that iterates/filters eligibleMappings
and sameProviderMappings) to always filter a provider's mappings by audio
eligibility using hasAudio and audioFormats before choosing that provider, and
if a provider ends up with zero audio-eligible mappings reject that
provider/request (return an error) instead of falling back to
single-mapping/default-region logic; update the sameProviderMappings handling
and the fallback path so both respect the audio filter and use the filtered
mapping list for selection.

In `@apps/gateway/src/chat/tools/messages-contain-audio.spec.ts`:
- Around line 122-124: In the tests in messages-contain-audio.spec.ts where the
local variable model is checked (currently "if (!model) { continue }"), stop
silently skipping and assert presence instead; replace the continue with an
explicit assertion (e.g., expect(model).toBeDefined() or throw a clear error) so
the test fails if a model ID is missing—apply this change for each occurrence
where model is guarded (the three spots shown) to surface catalog regressions.

In `@packages/models/src/models/google.ts`:
- Line 1822: The Groq mapping for model "gemma2-9b-it" incorrectly sets audio:
true which routes audio input to a text-only model; locate the mapping entry
that references "gemma2-9b-it" in packages/models/src/models/google.ts and
remove the audio: true flag (or explicitly set audio: false/remove audio-related
content types) so this provider/model pair only accepts text inputs and won't be
selected for audio/input_audio chat completions.

---

Nitpick comments:
In `@apps/gateway/src/audio.e2e.ts`:
- Around line 170-173: The test calculates expected using a fallback chain that
is unreachable because providers with inputAudioPrice === undefined are already
filtered; change the expected calculation to use provider.inputAudioPrice
directly (e.g., expected = audioTokens * provider.inputAudioPrice), remove the
"?? provider.inputPrice ?? 0" fallback, and ensure audioTokens/audioInputCost
assertions (the expect call) remain unchanged; if the compiler complains about
potential undefined, use a non-null assertion or refine the provider type where
filtered to guarantee inputAudioPrice is defined.

In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts`:
- Around line 326-565: The review points out that only the Google/Anthropic/AWS
Bedrock branches currently forward audioInputTokens into buildUsageObject and
applyExtendedUsageFields, so to make this forward-compatible, update any other
provider branches that construct or mutate transformedResponse (e.g., the
inference.net/together-ai/groq branch, alibaba, openai/azure/mistral/novita,
bytedance, xai, embercloud/zai, and the default case) to pass audioInputTokens
into buildUsageObject and into applyExtendedUsageFields where usage is merged;
locate these call sites by searching for buildUsageObject(...) and
applyExtendedUsageFields(...) and add the audioInputTokens parameter and
propagate it through the corresponding usage/extended-usage logic so
prompt_tokens_details.audio_tokens is preserved if those providers later support
audio.

In `@packages/actions/src/transform-google-messages.spec.ts`:
- Around line 105-122: Replace the manual try/catch sentinel pattern in the test
for transformGoogleMessages with the idiomatic Jest .rejects pattern: call
expect(transformGoogleMessages(audioMessages("aiff"), false, 20, null,
undefined, "quartz")) and use
.rejects.toBeInstanceOf(UnsupportedAudioFormatError), then assert on the
rejected error's properties (format and providerTarget) via
.rejects.toMatchObject or by awaiting the rejection value; update references to
the UnsupportedAudioFormatError and
audioMessages("aiff")/transformGoogleMessages invocations accordingly so the
test fails with a clear "expected promise to reject" message instead of catching
the sentinel throw.

In `@packages/actions/src/transform-google-messages.ts`:
- Around line 92-109: The error message in resolveGoogleAudioMime uses a ternary
that can incorrectly label the target as "Google AI Studio" for unrecognized
providerIds; update the UnsupportedAudioFormatError call to use the actual
providerId (or a clear fallback like "unknown provider") as the target instead
of the current ternary, referencing resolveGoogleAudioMime, providerId,
VERTEX_FAMILY/AI_STUDIO_FAMILY and UnsupportedAudioFormatError so the thrown
error reflects the real provider when the dead-code path is reached.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cbe06e32-bab5-4ab9-a048-4f05afbba3c7

📥 Commits

Reviewing files that changed from the base of the PR and between 1dfac9c and 2dc042a.

⛔ Files ignored due to path filters (1)
  • apps/gateway/src/test-fixtures/test-audio.wav is excluded by !**/*.wav
📒 Files selected for processing (25)
  • apps/gateway/src/app.ts
  • apps/gateway/src/audio.e2e.ts
  • apps/gateway/src/chat/chat.ts
  • apps/gateway/src/chat/schemas/completions.ts
  • apps/gateway/src/chat/tools/extract-token-usage.ts
  • apps/gateway/src/chat/tools/messages-contain-audio.spec.ts
  • apps/gateway/src/chat/tools/messages-contain-audio.ts
  • apps/gateway/src/chat/tools/parse-provider-response.ts
  • apps/gateway/src/chat/tools/transform-response-to-openai.ts
  • apps/gateway/src/lib/costs.spec.ts
  • apps/gateway/src/lib/costs.ts
  • apps/gateway/src/responses/tools/convert-chat-to-responses.ts
  • apps/gateway/src/responses/tools/convert-streaming-to-responses.ts
  • apps/worker/src/services/global-stats-aggregator.ts
  • apps/worker/src/services/project-stats-aggregator.ts
  • packages/actions/src/prepare-request-body.ts
  • packages/actions/src/transform-google-messages.spec.ts
  • packages/actions/src/transform-google-messages.ts
  • packages/db/migrations/1778260730_nervous_karnak.sql
  • packages/db/migrations/meta/1778260730_snapshot.json
  • packages/db/migrations/meta/_journal.json
  • packages/db/src/schema.ts
  • packages/models/src/models.ts
  • packages/models/src/models/google.ts
  • packages/models/src/types.ts

Comment thread apps/gateway/src/audio.e2e.ts Outdated
Comment thread apps/gateway/src/chat/chat.ts
Comment thread apps/gateway/src/chat/chat.ts
Comment thread apps/gateway/src/chat/chat.ts
Comment on lines +122 to +124
if (!model) {
continue;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t silently skip missing model IDs in these assertions.

Lines 122, 148, and 168 currently continue on missing models, which can hide catalog regressions and still pass the suite. Assert model existence instead.

Suggested fix
-		if (!model) {
-			continue;
-		}
+		expect(model, `Missing model definition for ${id}`).toBeDefined();
+		if (!model) {
+			throw new Error(`Missing model definition for ${id}`);
+		}

Also applies to: 148-150, 168-170

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/gateway/src/chat/tools/messages-contain-audio.spec.ts` around lines 122
- 124, In the tests in messages-contain-audio.spec.ts where the local variable
model is checked (currently "if (!model) { continue }"), stop silently skipping
and assert presence instead; replace the continue with an explicit assertion
(e.g., expect(model).toBeDefined() or throw a clear error) so the test fails if
a model ID is missing—apply this change for each occurrence where model is
guarded (the three spots shown) to surface catalog regressions.

Comment thread packages/models/src/models/google.ts Outdated
@RATCHAW RATCHAW marked this pull request as draft May 9, 2026 05:41
RATCHAW and others added 2 commits May 9, 2026 12:29
- When the request includes input_audio, only consider provider
  mappings with audio: true and a supported audio format; return 400
  if none qualify.
- Let non-allowlisted models participate in auto-routing when they
  support audio.
- Record audio_tokens for streaming and cached responses.
- Run the audio e2e suite sequentially.
- Remove the incorrect audio: true flag on Groq gemma2-9b-it.
@RATCHAW RATCHAW marked this pull request as ready for review May 9, 2026 11:42
RATCHAW and others added 3 commits May 9, 2026 13:13
- Introduced `audioInputCost` and `audioInputTokens` fields in the activity and logs schemas.
- Updated related calculations and API responses to include audio input costs.
- Ensured consistency across various API interfaces for audio input handling.
RATCHAW added 2 commits May 11, 2026 12:30
- Upgraded `@llmgateway/ai-sdk-provider` to version 3.8.0, introducing new audio capabilities.
- Added `audio` field to model provider mapping schema and updated related components to handle audio attachments.
- Enhanced chat UI to support audio messages, including playback controls for audio files.
- Updated internal models and types to reflect audio capabilities across the application.

This update improves the overall functionality and user experience for audio input handling.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/playground/src/components/playground/chat-ui.tsx (1)

798-809: ⚡ Quick win

Consider setting accept="" when neither images nor audio are supported.

When supportsImages and supportsAudio are both false, the accept prop is undefined, which causes matchesAccept in prompt-input.tsx to return true for all file types. Users could paste files, see them appear in the attachments list, but those files won't be sent with the message (lines 597-608 only attach files when the model supports them).

Consider setting accept="" to explicitly reject all files when neither capability is supported, which would trigger the "No files match the accepted types" error and provide clearer feedback.

📋 Suggested improvement
 accept={
 	supportsImages && supportsAudio
 		? "image/*,audio/*"
 		: supportsImages
 			? "image/*"
 			: supportsAudio
 				? "audio/*"
-				: undefined
+				: ""
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/playground/src/components/playground/chat-ui.tsx` around lines 798 -
809, The accept prop currently becomes undefined when supportsImages and
supportsAudio are both false, allowing matchesAccept to accept any file; update
the accept expression in the JSX (the prop named accept in chat-ui.tsx where key
uses supportsImages/supportsAudio) to return an empty string ("") instead of
undefined when neither supportsImages nor supportsAudio is true so files are
explicitly rejected; keep multiple and globalDrop as-is (or also set globalDrop
to false if you want to prevent drops) and ensure prompt-input.tsx's
matchesAccept will then correctly show the "No files match the accepted types"
error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/playground/src/components/playground/chat-ui.tsx`:
- Around line 798-809: The accept prop currently becomes undefined when
supportsImages and supportsAudio are both false, allowing matchesAccept to
accept any file; update the accept expression in the JSX (the prop named accept
in chat-ui.tsx where key uses supportsImages/supportsAudio) to return an empty
string ("") instead of undefined when neither supportsImages nor supportsAudio
is true so files are explicitly rejected; keep multiple and globalDrop as-is (or
also set globalDrop to false if you want to prevent drops) and ensure
prompt-input.tsx's matchesAccept will then correctly show the "No files match
the accepted types" error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 76f38cf5-034d-413c-8706-a53e666454fa

📥 Commits

Reviewing files that changed from the base of the PR and between 7eafe27 and 1b7756b.

⛔ Files ignored due to path filters (5)
  • apps/code/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
  • apps/playground/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
  • apps/ui/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
  • ee/admin/src/lib/api/v1.d.ts is excluded by !**/v1.d.ts
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (8)
  • apps/api/src/routes/internal-models.ts
  • apps/playground/package.json
  • apps/playground/src/components/ai-elements/prompt-input.tsx
  • apps/playground/src/components/playground/chat-page-client.tsx
  • apps/playground/src/components/playground/chat-ui.tsx
  • apps/playground/src/lib/fetch-models.ts
  • apps/playground/src/lib/mapmodels.ts
  • apps/playground/src/lib/types.ts
✅ Files skipped from review due to trivial changes (2)
  • apps/playground/package.json
  • apps/playground/src/lib/types.ts

RATCHAW and others added 5 commits May 11, 2026 13:12
- Introduced `audios` field in message schemas across various routes to handle audio attachments.
- Updated validation schemas to ensure audio content can be included in messages.
- Enhanced chat UI and related components to support audio playback and management.
- Added database migration to accommodate the new `audios` column in the message table.

This update improves the functionality of chat messages by allowing users to send and receive audio files, enhancing the overall user experience.
- Introduced a new UI component for audio attachments in the PromptInput, featuring a Music2Icon for visual representation.
- Updated the attachment handling logic to support audio type, enhancing user interaction with audio files.
- Removed the previous audio playback element to streamline the attachment display.

This update improves the user experience by providing a clearer interface for managing audio attachments.
@RATCHAW
Copy link
Copy Markdown
Contributor Author

RATCHAW commented May 11, 2026

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

✅ Actions performed

Reviews resumed.

@RATCHAW RATCHAW changed the title feat: support Google Gemini audio input feat: audio support for google models with integration into playground May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants