Skip to content

feat(room-io): add jsonFormat option for timed transcription output#1305

Merged
toubatbrian merged 7 commits intomainfrom
claude/jolly-lovelace-cj98X
Apr 30, 2026
Merged

feat(room-io): add jsonFormat option for timed transcription output#1305
toubatbrian merged 7 commits intomainfrom
claude/jolly-lovelace-cj98X

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

🤖 This is an automated Claude Code routine created by @toubatbrian. Right now it is in experimentation stage. This PR was auto-generated by porting a merged Python PR into agents-js.

Summary

Ports livekit/agents#5472 — "feat(room-io): add json_format option for timed transcription output" — into agents-js.

When jsonFormat is enabled on RoomOutputOptions, every chunk published by ParticipantTranscriptionOutput on the lk.transcription datastream topic is emitted as a newline-delimited JSON object instead of a raw string. Each object has:

  • text: the transcript chunk
  • start_time: seconds since the segment started (only set if the chunk is a TimedString with startTime)
  • end_time: seconds since the segment started (only set if the chunk is a TimedString with endTime)
  • confidence: optional STT confidence (when present)
  • start_time_offset: optional segment offset (when present)

Subscribers can parse the stream line-by-line (each chunk ends with \n).

Ported changes

  1. RoomOutputOptions.jsonFormat: boolean (agents/src/voice/room_io/room_io.ts) — new option, defaults to false.
  2. ParticipantTranscriptionOutput (agents/src/voice/room_io/_output.ts) — now takes an optional ParticipantTranscriptionOutputOptions 4th constructor argument with { jsonFormat?: boolean }. Overrides captureText so the JSON serialization happens before handleCaptureText.
  3. TranscriptionSynchronizer (agents/src/voice/transcription/synchronizer.ts) — SegmentSynchronizerImpl now writes TimedString items (with endTime reflecting synchronized playback timing) to its output stream instead of plain strings. The downstream TextOutput.captureText(string | TimedString) contract already accepted both, so the only downstream that materially changes behavior is ParticipantTranscriptionOutput when jsonFormat: true.
  4. Changeset (.changeset/room-io-json-transcription.md) — minor bump for @livekit/agents.
  5. Inline // Ref: python … comments on every ported line, per the agents-js porting guide in CLAUDE.md.

Implementation nuances (JS vs Python)

Cases where strict code-level parity was not practical:

  • TextOutputOptions vs RoomOutputOptions — Python has a dedicated TextOutputOptions dataclass on RoomOptions; agents-js keeps all room output settings inline on RoomOutputOptions. jsonFormat is therefore added directly to RoomOutputOptions and threaded into createTranscriptionOutput through this.outputOptions.jsonFormat, rather than routed through a nested options object.
  • Protobuf serialization — the Python patch builds a livekit.protocol.agent_pb.TimedString protobuf, then serializes via MessageToDict(preserving_proto_field_name=True). The JS port emits the same wire shape directly ({ text, start_time?, end_time?, confidence?, start_time_offset? }) without introducing a @livekit/protocol runtime dependency on the JS side. Field names remain snake_case to match the Python output byte-for-byte for consumers.
  • Time units — Python uses seconds throughout. agents-js uses ms internally (startWallTime = Date.now()), so the synchronizer divides by 1000 when stamping endTime on the emitted TimedString, keeping the JSON output on the same seconds-based scale as Python.
  • Synchronizer "remaining" branch — the JS SegmentSynchronizerImpl.mainTask has a small fallback (if (textCursor < sentence.length)) that emits leftover whitespace/punctuation between the last word and the end of a sentence. Python has no direct equivalent because its word-splitting handles this differently. To keep behavior conservative the leftover chunk is still emitted as a plain string (i.e. {"text": " "} with no timing when jsonFormat: true) rather than a fabricated TimedString.
  • Python uv.lock / pyproject.toml bumps (new cerebras / krisp / runway workspace entries and livekit-protocol>=1.1.6) were intentionally not ported — those are Python-dependency-manager changes.

Test plan

  • pnpm build:agents — passes
  • pnpm lint — passes (no new warnings introduced)
  • pnpm exec vitest run src/voice/room_io src/voice/transcription — 23/23 pass
  • Prettier formatting check on edited files — unchanged
  • Manual verification with a live agent: enable jsonFormat: true on RoomOutputOptions, subscribe to lk.transcription from a client, confirm each message is a parseable JSON object with text and end_time (and start_time when the upstream STT provides it).

cc @toubatbrian @livekit/agent-devs for review.


Generated by Claude Code

Port of livekit/agents#5472. Adds `jsonFormat` to `RoomOutputOptions`; when
enabled, chunks published on the `lk.transcription` datastream topic are
serialized as newline-delimited JSON objects with `text` and
`start_time`/`end_time` fields when the chunk is a `TimedString`. The
`TranscriptionSynchronizer` now emits `TimedString` items with `end_time`
reflecting synchronized playback timing so subscribers can align chunks
against playback without extra bookkeeping.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 23, 2026

🦋 Changeset detected

Latest commit: 64aac55

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 28 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 23, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
You have signed the CLA already but the status is still pending? Let us recheck it.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

claude and others added 3 commits April 23, 2026 11:36
…medString for trailing sentence fragment

- `ParticipantTranscriptionOutput` now stores the JSON-encoded payload (not the
  raw text) as `latestText` when `jsonFormat` is enabled, so the non-delta
  `FINAL=true` flush publishes the same newline-delimited JSON shape as interim
  chunks. Without this, `userTranscriptOutput` (which uses `isDeltaStream:
  false`) broke line-by-line JSON parsers on the terminal message. Mirrors the
  Python behavior in `_output.py` where `_latest_text` is reassigned to the
  encoded payload before `_latest_text = text`.
- `SegmentSynchronizerImpl` now wraps the trailing sentence fragment (anything
  after the last word) as a `TimedString` with `endTime`, matching every other
  emission on the same output stream.
devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread .changeset/room-io-json-transcription.md Outdated
Copy link
Copy Markdown
Contributor Author

@toubatbrian — done in 979bafa, the changeset is now patch.


Generated by Claude Code

@toubatbrian toubatbrian merged commit c92c0ef into main Apr 30, 2026
8 of 9 checks passed
@toubatbrian toubatbrian deleted the claude/jolly-lovelace-cj98X branch April 30, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants