Skip to content

feat(replay): support OpenAI chat-format JSONL in parseJsonlText#1011

Open
ggoldani wants to merge 1 commit into
rohitg00:mainfrom
ggoldani:feat/openai-jsonl-parser
Open

feat(replay): support OpenAI chat-format JSONL in parseJsonlText#1011
ggoldani wants to merge 1 commit into
rohitg00:mainfrom
ggoldani:feat/openai-jsonl-parser

Conversation

@ggoldani

@ggoldani ggoldani commented Jul 4, 2026

Copy link
Copy Markdown

Problem

parseJsonlText only recognizes Claude Code transcript format (type discriminator + structured message.content arrays). Agents that emit JSONL in the OpenAI chat format — notably Hermes Agent — produce zero observations when imported via /replay import-jsonl.

The parser silently drops every line because the OpenAI format uses top-level role/content fields and tool_calls arrays instead of the type discriminator.

Impact

Hermes Agent already has a connect/hermes.ts adapter and is listed as a supported agent, but its JSONL transcripts were invisible to replay. A real Hermes session of ~50 messages (prompts, tool calls, tool results, assistant responses) yields 0 observations without this fix.

This means: no embeddings, no search index entries, no crystal/lesson derivation, and an empty timeline for any OpenAI-format transcript.

Solution

Add a normalizeOpenAIEntry() function that runs as a pre-processing step inside the existing for loop, converting OpenAI chat messages to Claude Code format before the existing parser logic sees them.

Entries already in Claude Code format (those with entry.type or entry.message) pass through untouched — zero risk of regressions.

Mapped patterns

OpenAI shape Claude Code equivalent
{role:"user", content: string} type:"user" + message.content
{role:"tool", tool_call_id, content} type:"user" + tool_result content block
{role:"assistant", content, tool_calls} type:"assistant" + text + tool_use blocks
{role:"assistant", content: string} type:"assistant" + text block

tool_calls[].function.arguments (always a JSON string in OpenAI format) is parsed to an object, matching how Claude Code stores tool_use.input.

Testing

  • 19 existing tests still pass (no regressions)
  • 2 new tests added:
    • parses OpenAI chat-format JSONL — verifies all 4 patterns with a fixture
    • does not break Claude Code format when OpenAI entries are mixed in — verifies backward compatibility
  • Real-world validation: a 50-message Hermes session went from 0 → 49 observations

Files changed

  • src/replay/jsonl-parser.tsnormalizeOpenAIEntry() + 4 fields added to JsonlEntry interface
  • test/fixtures/jsonl/openai-format.jsonl — new fixture
  • test/replay.test.ts — 2 new test cases

Summary by CodeRabbit

  • New Features

    • JSONL replay now supports both Claude Code and OpenAI-style chat transcripts.
    • OpenAI conversations with user, assistant, and tool messages are now normalized automatically during parsing.
  • Bug Fixes

    • Assistant tool calls are now captured more reliably, including structured command arguments.
    • Mixed-format JSONL files continue to parse correctly without breaking existing replay behavior.
  • Tests

    • Added coverage for OpenAI-format transcript parsing and mixed-format JSONL handling.

Hermes Agent and other OpenAI-compatible agents emit JSONL transcripts in
the OpenAI chat format (top-level role/content, tool_calls arrays,
role:"tool" messages). The replay parser only recognized Claude Code's
format (type discriminator, structured content arrays), producing zero
observations from these transcripts.

Add a normalizeOpenAIEntry() pre-processing step that converts the four
OpenAI message shapes to Claude Code equivalents before the existing
parser logic runs. Entries already in Claude Code format pass through
untouched, so the change is backward-compatible.

Mapped patterns:
  - {role:"user", content}               → type:"user" + message.content
  - {role:"tool", tool_call_id, content} → type:"user" + tool_result block
  - {role:"assistant", tool_calls}       → type:"assistant" + tool_use blocks
  - {role:"assistant", content}          → type:"assistant" + text block

Tested with a real Hermes session: 0 → 49 observations extracted.
All 19 existing replay tests + 2 new tests pass.
@vercel

vercel Bot commented Jul 4, 2026

Copy link
Copy Markdown

@ggoldani is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds support for normalizing OpenAI-style chat JSONL entries (role/content/tool_calls) into the Claude Code transcript format within the JSONL replay parser, including a new normalization helper, updated parsing loop, a test fixture, and new tests covering pure and mixed-format inputs.

Changes

OpenAI JSONL Normalization

Layer / File(s) Summary
Entry type extension and normalization logic
src/replay/jsonl-parser.ts
JsonlEntry gains OpenAI-style fields (role, content, tool_calls, tool_call_id); new normalizeOpenAIEntry converts OpenAI user/tool/assistant entries into Claude Code type/message shape, parsing tool call arguments and skipping phantom assistant entries.
Parsing loop wiring
src/replay/jsonl-parser.ts
parseJsonlText now normalizes each raw JSONL object via normalizeOpenAIEntry before extracting session/cwd/timestamps and emitting observations.
Fixture and test coverage
test/fixtures/jsonl/openai-format.jsonl, test/replay.test.ts
Adds a 4-message OpenAI-format fixture and two new tests validating parsing of OpenAI-only and mixed OpenAI/Claude Code JSONL inputs.

Estimated code review effort: 2 (Simple) | ~15 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant parseJsonlText
  participant normalizeOpenAIEntry

  Caller->>parseJsonlText: JSONL text (OpenAI or Claude Code format)
  loop each raw entry
    parseJsonlText->>normalizeOpenAIEntry: rawEntry
    alt already Claude Code format
      normalizeOpenAIEntry-->>parseJsonlText: entry unchanged
    else OpenAI user/tool/assistant format
      normalizeOpenAIEntry->>normalizeOpenAIEntry: map role/content/tool_calls to type/message
      normalizeOpenAIEntry-->>parseJsonlText: normalized entry
    end
    parseJsonlText->>parseJsonlText: extract session/cwd/timestamp, emit observation
  end
  parseJsonlText-->>Caller: observations
Loading
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding OpenAI chat-format JSONL support to parseJsonlText.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/replay/jsonl-parser.ts (1)

85-155: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Replace WHAT-comments with self-documenting structure.

The inline comments at Lines 99, 105, 110, and 123 (e.g. // { role: "user", content: "..." }, // Already in Claude Code format — leave as-is.) describe what each branch matches rather than documenting non-obvious rationale. Splitting the three branches into small, named helpers (e.g. normalizeUserEntry, normalizeToolResultEntry, normalizeAssistantEntry) would make the shape-matching self-evident without comments.

As per coding guidelines, "In TypeScript source code, avoid code comments explaining WHAT — use clear naming instead."

♻️ Example restructuring
-  // { role: "user", content: "..." }
-  if (role === "user" && content !== undefined) {
-    return { ...entry, type: "user", message: { role: "user", content } };
-  }
-
-  // { role: "tool", tool_call_id: "...", content: "..." }
-  if (role === "tool") {
-    const toolUseId = typeof entry.tool_call_id === "string" ? entry.tool_call_id : "";
-    return {
-      ...entry,
-      type: "user",
-      message: {
-        role: "user",
-        content: [{ type: "tool_result", tool_use_id: toolUseId, content }],
-      },
-    };
-  }
+  if (role === "user" && content !== undefined) return normalizeUserEntry(entry, content);
+  if (role === "tool") return normalizeToolResultEntry(entry, content);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/replay/jsonl-parser.ts` around lines 85 - 155, Refactor
normalizeOpenAIEntry in jsonl-parser.ts to remove the WHAT-style inline comments
by splitting the role-specific branches into small named helpers such as
normalizeUserEntry, normalizeToolResultEntry, and normalizeAssistantEntry. Keep
the existing behavior the same, but make the shape matching obvious through the
helper names and use normalizeOpenAIEntry as the dispatcher so the code no
longer relies on comments to explain each branch.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/replay/jsonl-parser.ts`:
- Around line 85-155: Refactor normalizeOpenAIEntry in jsonl-parser.ts to remove
the WHAT-style inline comments by splitting the role-specific branches into
small named helpers such as normalizeUserEntry, normalizeToolResultEntry, and
normalizeAssistantEntry. Keep the existing behavior the same, but make the shape
matching obvious through the helper names and use normalizeOpenAIEntry as the
dispatcher so the code no longer relies on comments to explain each branch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: df86f2ac-f07c-48c9-9dc2-0cf455ac5b14

📥 Commits

Reviewing files that changed from the base of the PR and between 93ae9bc and fb01e18.

📒 Files selected for processing (3)
  • src/replay/jsonl-parser.ts
  • test/fixtures/jsonl/openai-format.jsonl
  • test/replay.test.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant