Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .instar/instar-dev-decisions/2026-06-26T09-28-48-199Z-unknown.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"ts": "2026-06-26T09:28:48.199Z",
"slug": "unknown",
"suggestedTier": 2,
"declaredTier": 1,
"riskFloor": 1,
"riskFloorReasons": [],
"belowFloor": false,
"files": 2,
"loc": 107,
"causalAutopsy": null,
"verdict": "pass"
}
36 changes: 36 additions & 0 deletions docs/specs/inbound-delivery-sacred.eli16.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Inbound Delivery Is Sacred — Plain-English Overview

## What broke

When you send me a message and I can't hand it to a live session right away (for
example while a conversation is moving between machines), it can go into a small
durable "holding queue" so it isn't lost. The queue does its job: it keeps your
message and, if it eventually has to give up on one, it writes down "I didn't get
to this."

The problem was the LAST step — telling YOU. That "I didn't get to your messages"
notice was sent to a single internal "attention" channel, and if that channel
isn't set up, the notice was just... dropped. Silently. So your message could be
lost AND you'd never be told. That's exactly the "why aren't you responding?"
failure from the bad night — the message died quietly in the queue.

## What this change does

It makes the loss notice come back to YOU, in the actual conversation you sent the
message from. Every held message already remembers which conversation it came from,
so now if I have to give up on it, you get a plain note right there: "I didn't get
to N of your messages — resend anything still needed."

And for the rare case where a lost message can't be tied to a conversation AND
there's no fallback channel set up, instead of dropping it silently, I now make it
loud — it's written to my error log so it can never just vanish. A lost message of
yours is never silent.

## What you'll notice

Almost nothing changes day-to-day, because this safety queue ships switched off by
default — it only runs if it's deliberately turned on. What changes is the
guarantee: when the queue IS in use and it can't deliver one of your messages, you
hear about it in your own conversation, not in some side channel you might never
see. The point is simple — a message you send me either gets through, or you're
told it didn't. Never silence.
39 changes: 39 additions & 0 deletions docs/specs/inbound-delivery-sacred.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Inbound Delivery Is Sacred — loss notices reach the user (F3)

**Status:** draft (Tier-1 instar-dev; the PR is the review surface).
**Constitution:** *The User Experience Is the Product* → sub-standard #3 **Inbound Delivery Is Sacred**.
**Earned from:** 2026-06-25 (postmortem Failure 3) — the durable inbound holding-queue captured the user's Telegram messages "to keep them safe", then expired them after retries WITHOUT the user ever hearing about it. "Why aren't you responding?" died in the queue.

## What already exists (do NOT rebuild)

A codebase sweep (2026-06-26) found two of the three things the postmortem implied are missing are already in code:

1. **Loss-detection is complete.** `QueueDrainLoop` calls `reportLoss`/`reportPossiblyNotInjected` on EVERY terminal-expiry path (ttl-expired, attempts-exhausted, stale-custody, poisoned, overflow, …). No silent drop in the queue engine itself.
2. **The fail-OPEN fallback is comprehensive.** The Telegram inbound handler falls through to the direct-inject path on every uncertain case (dry-run, not-lease-holder, storage-failure, dark engine, route-throw). The corollary "a half-built net fails OPEN, never capture-and-drop" is satisfied in code.

## The actual gap (what this fixes)

The loud-failure **channel**. Every loss report funnels into `notify(tier, category, message)`, which resolves the destination to a single `agent-attention-topic` state key — and **silently skips the Telegram send when that key is unset** (`resolvedTopicId === 0`). Loss notices were also SUMMARY-batched and routed to the attention topic, **never to the topic the user actually messaged from**, even though every loss item carries `sessionKey` (the originating topic id).

So a lost inbound message could still die quietly: if no attention topic is configured, the "I didn't get to your messages" notice is dropped.

## The change

A pure router + a thin server helper, applied to every inbound-queue loss-notice site:

- **`planInboundLossNotices(items)`** (`src/core/inboundLossRouting.ts`, pure + unit-tested): groups loss items by `Number(sessionKey)` → `{ perTopic: [{topicId, count}], unresolved }`. A non-numeric/zero/negative sessionKey is `unresolved` (never silently assigned to a topic).
- **`notifyInboundLoss(items, tier, buildMessage)`** (server.ts): emits a per-ORIGINATING-topic notice (`notify(tier, 'inbound-loss', msg, topicId)`) so each user hears about THEIR lost messages, in THEIR topic, on the proven Telegram path. Items with no resolvable topic fall back to the attention topic; if that is ALSO unset, the loss is surfaced **loudly** (`console.error`) — the one seam where a loss could otherwise go silent is closed.
- Applied to all 5 inbound-queue loss sites: boot-sweep `reportLoss`/`reportPossiblyNotInjected`, the no-mesh-identity dropped path, and the drain-loop `reportLoss`/`reportPossiblyNotInjected`. (The `stuck-recovery` turn-incomplete notice already routes per-topic — unchanged.)

## Scope / safety

The inbound queue ships **dark** (`inboundQueueConfig` `enabled:false`), so this code only runs when the queue is explicitly enabled — it hardens the channel for when it IS live, with no behavior change while dark. No new blocking authority; this is a delivery-routing change (a signal-delivery improvement, not a gate).

## Tests

- Unit: `planInboundLossNotices` — routes to originating topic, unresolved counting, zero/negative/non-numeric → unresolved, deterministic order, empty input. (6 cases.)
- The server helper's loud-fallback path is covered by the side-effects review + the pure-function unit tests (the `unresolved` branch is the one console.error surface).

## Reconcile (open)

At v1.3.671+ the queue ships dark, so the production capture-and-drop the postmortem observed was likely the PendingInjectStore path or a non-dark deploy — worth confirming which holding-queue was live during the incident. This change hardens the inbound-queue channel regardless.
61 changes: 47 additions & 14 deletions src/commands/server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ const __dirname = path.dirname(fileURLToPath(import.meta.url));
const execFileAsync = promisify(execFile);
import { loadConfig, ensureStateDir, detectTmuxPath, detectGeminiPath } from '../core/Config.js';
import { handleProcessLevelError } from '../core/uncaughtExceptionPolicy.js';
import { planInboundLossNotices } from '../core/inboundLossRouting.js';
import { configureHostSpawnSemaphore } from '../core/hostSpawnSemaphore.js';
import { SingleInstanceLock, installReleaseHandlers } from '../core/SingleInstanceLock.js';
import { resolveDevAgentGate, resolveStateSyncStores } from '../core/devAgentGate.js';
Expand Down Expand Up @@ -3078,6 +3079,42 @@ export async function startServer(options: StartOptions): Promise<void> {
}
}

/**
* F3 (Inbound Delivery Is Sacred): route an inbound-queue loss notice to each
* ORIGINATING topic on the proven Telegram path (each loss item's sessionKey
* IS the topic id the user messaged from), instead of the single attention
* topic that `notify()` SILENTLY DROPS when unset. Items with no resolvable
* numeric topic fall back to the attention topic; if THAT is also unset, the
* loss is surfaced LOUDLY (console.error) — a lost inbound user message is
* NEVER silently expired (the Inbound-Delivery-Is-Sacred corollary). Spec:
* docs/specs/inbound-delivery-sacred.md.
*/
function notifyInboundLoss(
items: ReadonlyArray<{ sessionKey: string }>,
tier: NotificationTier,
buildMessage: (count: number) => string,
): void {
if (items.length === 0) return;
const plan = planInboundLossNotices(items);
// Each affected topic gets its OWN notice, in that topic (proven path).
for (const { topicId, count } of plan.perTopic) {
notify(tier, 'inbound-loss', buildMessage(count), topicId);
}
if (plan.unresolved > 0) {
const attn = _notifyState?.get<number>('agent-attention-topic') ?? 0;
if (attn) {
notify(tier, 'inbound-loss', buildMessage(plan.unresolved), attn);
} else {
// No resolvable topic AND no attention topic configured → the one place a
// loss could go silent. Surface it LOUDLY instead (never a silent expiry).
console.error(
`[inbound-loss] ${plan.unresolved} lost inbound message(s) had no resolvable topic and no attention topic is ` +
`configured — surfaced loudly per Inbound-Delivery-Is-Sacred (configure an attention topic so these reach you).`,
);
}
}
}

/**
* Translate coherence check failures into human-readable, actionable messages.
*/
Expand Down Expand Up @@ -8292,14 +8329,12 @@ export async function startServer(options: StartOptions): Promise<void> {
hasPisRecord: (sk) => pisRecordsForTopic(sk).length > 0,
clearPisRecord: (sk) => { for (const r of pisRecordsForTopic(sk)) sweepPis.clear(r.tmuxSession); },
reportLoss: (items, reason) => {
const topics = [...new Set(items.map((i) => i.sessionKey))].join(', ');
notify('SUMMARY', 'inbound-queue',
`I didn't get to ${items.length} queued message(s) (${reason}; topics: ${topics}) — resend anything still needed.`);
notifyInboundLoss(items, 'SUMMARY', (count) =>
`I didn't get to ${count} of your message(s) (${reason}) — resend anything still needed.`);
},
reportPossiblyNotInjected: (items) => {
const topics = [...new Set(items.map((i) => i.sessionKey))].join(', ');
notify('SUMMARY', 'inbound-queue',
`${items.length} message(s) may not have been injected before a crash (topics: ${topics}) — if a message went unanswered, resend it.`);
notifyInboundLoss(items, 'SUMMARY', (count) =>
`${count} of your message(s) may not have been injected before a crash — if a message went unanswered, resend it.`);
},
raiseAttention: (title, body) => notify('IMMEDIATE', 'inbound-queue', `${title}: ${body}`),
log: (line) => console.log(pc.dim(` ${line}`)),
Expand Down Expand Up @@ -8333,8 +8368,8 @@ export async function startServer(options: StartOptions): Promise<void> {
}
}
if (dropped.length > 0) {
notify('SUMMARY', 'inbound-queue',
`I didn't get to ${dropped.length} queued message(s) (the queue is enabled but this machine has no mesh identity, so the drain never started; topics: ${[...new Set(dropped)].join(', ')}) — resend anything still needed.`);
notifyInboundLoss(dropped.map((sk) => ({ sessionKey: sk })), 'SUMMARY', (count) =>
`I didn't get to ${count} of your message(s) (the queue is enabled but this machine has no mesh identity, so the drain never started) — resend anything still needed.`);
}
store.close();
_sweptInboundStore = null;
Expand Down Expand Up @@ -18335,14 +18370,12 @@ export async function startServer(options: StartOptions): Promise<void> {
}
},
reportLoss: (items, reason) => {
const topics = [...new Set(items.map((i) => i.sessionKey))].join(', ');
notify('SUMMARY', 'inbound-queue',
`I didn't get to ${items.length} queued message(s) (${reason}; topics: ${topics}) — resend anything still needed.`);
notifyInboundLoss(items, 'SUMMARY', (count) =>
`I didn't get to ${count} of your message(s) (${reason}) — resend anything still needed.`);
},
reportPossiblyNotInjected: (items) => {
const topics = [...new Set(items.map((i) => i.sessionKey))].join(', ');
notify('SUMMARY', 'inbound-queue',
`${items.length} message(s) may not have been injected (topics: ${topics}) — if a message went unanswered, resend it.`);
notifyInboundLoss(items, 'SUMMARY', (count) =>
`${count} of your message(s) may not have been injected — if a message went unanswered, resend it.`);
},
log: (line) => console.log(pc.dim(` ${line}`)),
reportDegradation: (reason) => {
Expand Down
46 changes: 46 additions & 0 deletions src/core/inboundLossRouting.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
/**
* F3 (Inbound Delivery Is Sacred) — pure routing plan for inbound-queue loss
* notices. A lost inbound user message must reach the user who sent it (each
* loss item's `sessionKey` IS the topic id they messaged from) OR, if it has no
* resolvable destination, be surfaced LOUDLY — never silently expired. This pure
* function decides the routing; server.ts does the actual notify()/loud-surface.
* Constitution: "The User Experience Is the Product" → sub-standard #3 Inbound
* Delivery Is Sacred. Spec: docs/specs/inbound-delivery-sacred.md.
*/

export interface InboundLossRoutePlan {
/** Per-ORIGINATING-topic notice — each affected topic + how many of its
* messages were lost (delivered IN that topic, the proven path). */
perTopic: Array<{ topicId: number; count: number }>;
/** Count of lost items whose sessionKey is not a resolvable numeric topic.
* These fall back to the attention topic; if that is unset too, they MUST be
* surfaced loudly (the one seam where a loss could otherwise go silent). */
unresolved: number;
}

/**
* Group inbound loss items by their originating topic. Pure: same input → same
* output. A sessionKey that is a positive finite number is a topic id; anything
* else (empty, non-numeric, legacy single-file key) is counted as `unresolved`.
*/
export function planInboundLossNotices(
items: ReadonlyArray<{ sessionKey: string }>,
): InboundLossRoutePlan {
const byTopic = new Map<number, number>();
let unresolved = 0;
for (const it of items) {
const tid = Number(it.sessionKey);
if (Number.isFinite(tid) && tid > 0) {
byTopic.set(tid, (byTopic.get(tid) ?? 0) + 1);
} else {
unresolved++;
}
}
return {
// Deterministic order (ascending topic id) for stable notices + tests.
perTopic: [...byTopic.entries()]
.sort((a, b) => a[0] - b[0])
.map(([topicId, count]) => ({ topicId, count })),
unresolved,
};
}
49 changes: 49 additions & 0 deletions tests/unit/inboundLossRouting.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/**
* F3 (Inbound Delivery Is Sacred) — pure loss-routing tests (Tier 1).
* Covers the decision boundary: a lost inbound message routes to its ORIGINATING
* topic (sessionKey = topic id), and a message with no resolvable topic is
* counted as `unresolved` (which the caller must surface loudly, never drop).
* Spec: docs/specs/inbound-delivery-sacred.md.
*/
import { describe, it, expect } from 'vitest';
import { planInboundLossNotices } from '../../src/core/inboundLossRouting.js';

describe('planInboundLossNotices', () => {
it('routes each loss to its originating topic (sessionKey = topic id)', () => {
const plan = planInboundLossNotices([{ sessionKey: '28744' }, { sessionKey: '28744' }, { sessionKey: '100' }]);
expect(plan.perTopic).toEqual([
{ topicId: 100, count: 1 },
{ topicId: 28744, count: 2 },
]);
expect(plan.unresolved).toBe(0);
});

it('a non-numeric sessionKey is unresolved (never silently assigned to a topic)', () => {
const plan = planInboundLossNotices([{ sessionKey: 'legacy-single-file' }, { sessionKey: '' }]);
expect(plan.perTopic).toEqual([]);
expect(plan.unresolved).toBe(2);
});

it('mixes resolvable + unresolved correctly', () => {
const plan = planInboundLossNotices([{ sessionKey: '5' }, { sessionKey: 'x' }, { sessionKey: '5' }]);
expect(plan.perTopic).toEqual([{ topicId: 5, count: 2 }]);
expect(plan.unresolved).toBe(1);
});

it('zero/negative sessionKey is treated as unresolved (a topic id is positive)', () => {
const plan = planInboundLossNotices([{ sessionKey: '0' }, { sessionKey: '-3' }]);
expect(plan.perTopic).toEqual([]);
expect(plan.unresolved).toBe(2);
});

it('empty input → empty plan', () => {
const plan = planInboundLossNotices([]);
expect(plan.perTopic).toEqual([]);
expect(plan.unresolved).toBe(0);
});

it('per-topic order is deterministic (ascending topic id)', () => {
const plan = planInboundLossNotices([{ sessionKey: '900' }, { sessionKey: '12' }, { sessionKey: '300' }]);
expect(plan.perTopic.map((p) => p.topicId)).toEqual([12, 300, 900]);
});
});
42 changes: 42 additions & 0 deletions upgrades/next/inbound-delivery-sacred.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Inbound Delivery Is Sacred (Postmortem F3)

**Slug:** `inbound-delivery-sacred` · **Maturity:** 🧪 Preview (hardens a dark feature; no behavior change while the inbound queue is off) · **Audience:** agent-only

## What Changed

The durable inbound holding-queue's loss notices ("I didn't get to your messages")
were delivered to a single internal attention-topic and **silently dropped when that
topic was unset** — so a held inbound user message could expire AND the user never be
told (postmortem Failure 3, the "why aren't you responding?" failure). This routes each
loss notice to the **originating topic** the user actually messaged from (every loss
item carries its `sessionKey` = topic id), and surfaces LOUDLY (`console.error`) the one
residual case where a loss has no resolvable topic and no attention topic is configured.
A lost inbound message is now never silently expired.

New pure router `planInboundLossNotices` (`src/core/inboundLossRouting.ts`, unit-tested)
+ a `notifyInboundLoss` helper in server.ts, applied to all 5 inbound-queue loss sites.
Constitution: *The User Experience Is the Product* → sub-standard #3.

## What to Tell Your User

Nothing changes day-to-day — the inbound holding-queue ships **dark** (off by default),
so this code only runs when it is deliberately enabled. The guarantee it adds: when the
queue IS in use and can't deliver one of your messages, you hear about it **in your own
conversation**, not in a side channel you might never see. A message you send either
gets through or you're told it didn't — never silence.

## Summary of New Capabilities

- Inbound-queue loss notices now route to the originating topic (per-user, in their
conversation) instead of a single attention topic.
- A loss with no resolvable topic and no attention topic configured is surfaced loudly
(error log) instead of silently dropped.
- No new config, route, or surface; no behavior change while the inbound queue is dark.

## Evidence

- 6 unit tests for the pure router (`tests/unit/inboundLossRouting.test.ts`) — originating-topic
routing, unresolved counting, zero/negative/non-numeric → unresolved, deterministic order.
- Full `npm run lint` (tsc + ~20 lint scripts) exits 0; clean tsc.
- Side-effects review: `upgrades/side-effects/inbound-delivery-sacred.md`.
- Spec: `docs/specs/inbound-delivery-sacred.md`.
Loading
Loading