feat(replay): support OpenAI chat-format JSONL in parseJsonlText#1011
feat(replay): support OpenAI chat-format JSONL in parseJsonlText#1011ggoldani wants to merge 1 commit into
Conversation
Hermes Agent and other OpenAI-compatible agents emit JSONL transcripts in
the OpenAI chat format (top-level role/content, tool_calls arrays,
role:"tool" messages). The replay parser only recognized Claude Code's
format (type discriminator, structured content arrays), producing zero
observations from these transcripts.
Add a normalizeOpenAIEntry() pre-processing step that converts the four
OpenAI message shapes to Claude Code equivalents before the existing
parser logic runs. Entries already in Claude Code format pass through
untouched, so the change is backward-compatible.
Mapped patterns:
- {role:"user", content} → type:"user" + message.content
- {role:"tool", tool_call_id, content} → type:"user" + tool_result block
- {role:"assistant", tool_calls} → type:"assistant" + tool_use blocks
- {role:"assistant", content} → type:"assistant" + text block
Tested with a real Hermes session: 0 → 49 observations extracted.
All 19 existing replay tests + 2 new tests pass.
|
@ggoldani is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughAdds support for normalizing OpenAI-style chat JSONL entries (role/content/tool_calls) into the Claude Code transcript format within the JSONL replay parser, including a new normalization helper, updated parsing loop, a test fixture, and new tests covering pure and mixed-format inputs. ChangesOpenAI JSONL Normalization
Estimated code review effort: 2 (Simple) | ~15 minutes Sequence Diagram(s)sequenceDiagram
participant Caller
participant parseJsonlText
participant normalizeOpenAIEntry
Caller->>parseJsonlText: JSONL text (OpenAI or Claude Code format)
loop each raw entry
parseJsonlText->>normalizeOpenAIEntry: rawEntry
alt already Claude Code format
normalizeOpenAIEntry-->>parseJsonlText: entry unchanged
else OpenAI user/tool/assistant format
normalizeOpenAIEntry->>normalizeOpenAIEntry: map role/content/tool_calls to type/message
normalizeOpenAIEntry-->>parseJsonlText: normalized entry
end
parseJsonlText->>parseJsonlText: extract session/cwd/timestamp, emit observation
end
parseJsonlText-->>Caller: observations
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/replay/jsonl-parser.ts (1)
85-155: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winReplace WHAT-comments with self-documenting structure.
The inline comments at Lines 99, 105, 110, and 123 (e.g.
// { role: "user", content: "..." },// Already in Claude Code format — leave as-is.) describe what each branch matches rather than documenting non-obvious rationale. Splitting the three branches into small, named helpers (e.g.normalizeUserEntry,normalizeToolResultEntry,normalizeAssistantEntry) would make the shape-matching self-evident without comments.As per coding guidelines, "In TypeScript source code, avoid code comments explaining WHAT — use clear naming instead."
♻️ Example restructuring
- // { role: "user", content: "..." } - if (role === "user" && content !== undefined) { - return { ...entry, type: "user", message: { role: "user", content } }; - } - - // { role: "tool", tool_call_id: "...", content: "..." } - if (role === "tool") { - const toolUseId = typeof entry.tool_call_id === "string" ? entry.tool_call_id : ""; - return { - ...entry, - type: "user", - message: { - role: "user", - content: [{ type: "tool_result", tool_use_id: toolUseId, content }], - }, - }; - } + if (role === "user" && content !== undefined) return normalizeUserEntry(entry, content); + if (role === "tool") return normalizeToolResultEntry(entry, content);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/replay/jsonl-parser.ts` around lines 85 - 155, Refactor normalizeOpenAIEntry in jsonl-parser.ts to remove the WHAT-style inline comments by splitting the role-specific branches into small named helpers such as normalizeUserEntry, normalizeToolResultEntry, and normalizeAssistantEntry. Keep the existing behavior the same, but make the shape matching obvious through the helper names and use normalizeOpenAIEntry as the dispatcher so the code no longer relies on comments to explain each branch.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/replay/jsonl-parser.ts`:
- Around line 85-155: Refactor normalizeOpenAIEntry in jsonl-parser.ts to remove
the WHAT-style inline comments by splitting the role-specific branches into
small named helpers such as normalizeUserEntry, normalizeToolResultEntry, and
normalizeAssistantEntry. Keep the existing behavior the same, but make the shape
matching obvious through the helper names and use normalizeOpenAIEntry as the
dispatcher so the code no longer relies on comments to explain each branch.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: df86f2ac-f07c-48c9-9dc2-0cf455ac5b14
📒 Files selected for processing (3)
src/replay/jsonl-parser.tstest/fixtures/jsonl/openai-format.jsonltest/replay.test.ts
Problem
parseJsonlTextonly recognizes Claude Code transcript format (typediscriminator + structuredmessage.contentarrays). Agents that emit JSONL in the OpenAI chat format — notably Hermes Agent — produce zero observations when imported via/replay import-jsonl.The parser silently drops every line because the OpenAI format uses top-level
role/contentfields andtool_callsarrays instead of thetypediscriminator.Impact
Hermes Agent already has a
connect/hermes.tsadapter and is listed as a supported agent, but its JSONL transcripts were invisible to replay. A real Hermes session of ~50 messages (prompts, tool calls, tool results, assistant responses) yields 0 observations without this fix.This means: no embeddings, no search index entries, no crystal/lesson derivation, and an empty timeline for any OpenAI-format transcript.
Solution
Add a
normalizeOpenAIEntry()function that runs as a pre-processing step inside the existingforloop, converting OpenAI chat messages to Claude Code format before the existing parser logic sees them.Entries already in Claude Code format (those with
entry.typeorentry.message) pass through untouched — zero risk of regressions.Mapped patterns
{role:"user", content: string}type:"user"+message.content{role:"tool", tool_call_id, content}type:"user"+tool_resultcontent block{role:"assistant", content, tool_calls}type:"assistant"+text+tool_useblocks{role:"assistant", content: string}type:"assistant"+textblocktool_calls[].function.arguments(always a JSON string in OpenAI format) is parsed to an object, matching how Claude Code storestool_use.input.Testing
parses OpenAI chat-format JSONL— verifies all 4 patterns with a fixturedoes not break Claude Code format when OpenAI entries are mixed in— verifies backward compatibilityFiles changed
src/replay/jsonl-parser.ts—normalizeOpenAIEntry()+ 4 fields added toJsonlEntryinterfacetest/fixtures/jsonl/openai-format.jsonl— new fixturetest/replay.test.ts— 2 new test casesSummary by CodeRabbit
New Features
Bug Fixes
Tests