feat(mcp-server): persist async job registry across restarts (#237)#251
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 041adceb07
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| }); | ||
|
|
||
| jobId = handle.runId; | ||
| persistSnapshot(ctx, jobId, handle); |
There was a problem hiding this comment.
Preserve original run input when writing async snapshots
dispatchStart immediately persists the synchronous handle returned by runner.fire, but that snapshot does not include the original request input, so this write can overwrite the runner’s own store row with an input-less record. Recovery later depends on record.input (packages/server/src/runner.ts, recover) to replay in-flight jobs, so a restart during a still-running job can replay with undefined input (or fallback static task input), producing wrong outputs or validation failures after restart.
Useful? React with 👍 / 👎.
| if (h.pendingCheckpoint.recoveredFromStore === true) { | ||
| throw new InvalidRunStateError(runId, h.state); |
There was a problem hiding this comment.
Don’t expose checkpoint state that cannot be resumed yet
Recovered checkpoint runs are rehydrated as state === "awaiting-checkpoint", but resume() explicitly rejects them while pendingCheckpoint.recoveredFromStore is true. After restart, clients can observe awaiting-checkpoint from status and still get INVALID_RUN_STATE from resume_workflow; if replay takes a long time to hit the checkpoint again, this becomes a prolonged false-ready state.
Useful? React with 👍 / 👎.
Closes #237
What changed
@ageflow/serverviaRunStore+ snapshot hydration/recovery path.@ageflow/server-sqlitewith SQLite-backedRunStoreimplementation (Bun + Node runtime support).@ageflow/mcp-serverasync jobs to use durable job store:jobStoreabstraction and SQLite loaderjobDbPathsupport in server/programmatic pathsmcp serveto expose async durable store option (--job-db).@ageflow/server-sqlite.Tests
bun run --filter @ageflow/server typecheckbun run --filter @ageflow/server testbun run --filter @ageflow/server-sqlite typecheckbun run --filter @ageflow/server-sqlite testbun run --filter @ageflow/mcp-server typecheckbun run --filter @ageflow/mcp-server testbun run --filter @ageflow/cli typecheckbun run --filter @ageflow/cli testAll passed locally.