Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,32 @@ version 2: its `{{ … }}` sequences become substitution points, and its
(`ovos.utterance.speak` or `ovos.session.sync`).
## OVOS-SESSION-1 — Session Carrier Wire Shape

### 2

- §3.5 (new) — registers the user-preference fields `location`
(object), `system_unit`, `time_format`, `date_format` (strings):
the session origin's presentation preferences, defaulting per §2.1
and subject to the §3.4 wire-weight rule. The same section
deliberately declines to register transient audio state
(`is_speaking` / `is_recording`): per-device instant state is not
session state, implementations MUST NOT rely on it, and the
OVOS-AUDIO-1 output-lifecycle signals are authoritative.
- §3.2 — the six language signals given a one-purpose-per-field
summary table and split into preference (`lang`,
`secondary_langs`, `output_lang`) vs. per-utterance observation
(`stt_lang`, `request_lang`, `detected_lang`); stale
four-field count corrected.
- §3.1 — a remote participant SHOULD use a distinct `session_id`;
remote use of `"default"` SHOULD be gated behind an
elevated-privilege grant in layer-2 systems.
- §3.2.5 — the wake-word-derived `request_lang` hint cites its
observable source, `ovos.listener.wakeword` (OVOS-AUDIO-IN-1
§6.5).
- §3.4 — notes the OVOS-SESSION-2 §3.2 thin-session allowance for
intermediate emissions.
- §4.1 — default-materialization citation corrected to the
OVOS-MSG-1 §5.1–§5.2 derivations; phantom quotation dropped.

### 1

- The `context.session` carrier wire shape: the `session_id` and `lang`
Expand Down
91 changes: 77 additions & 14 deletions session-1.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Session Specification

**Spec ID:** OVOS-SESSION-1 · **Version:** 1 · **Status:** Draft
**Spec ID:** OVOS-SESSION-1 · **Version:** 2 · **Status:** Draft

This document defines the **wire shape** of the `session` carrier —
the JSON object that travels inside `Message.context.session` — and
Expand Down Expand Up @@ -213,6 +213,10 @@ session and persist across utterances.
| `blacklisted_dialog_transformers` | array of string | OVOS-TRANSFORM-1 §5.2 |
| `blacklisted_tts_transformers` | array of string | OVOS-TRANSFORM-1 §5.2 |
| `site_id` | string | OVOS-BRIDGE-1 §3.3 |
| `location` | object | §3.5 (this spec) |
| `system_unit` | string | §3.5 (this spec) |
| `time_format` | string | §3.5 (this spec) |
| `date_format` | string | §3.5 (this spec) |

Every field above is OPTIONAL on the wire. A producer that sets a
field **MUST** use the wire type listed and the value space defined
Expand Down Expand Up @@ -243,9 +247,13 @@ device** (remote-control commands, home-automation "speak" requests,
media injection from a layer-2 framework). Using `"default"` from a
remote client is deliberate impersonation of the device-local
session; whether that is authorized is a **layer-2 concern** outside
this specification. A layer-2 authentication system **MAY** gate
access to the default session behind an elevated-privilege flag (an
"admin" grant or equivalent); SESSION-1 places no requirement on it.
this specification. A remote participant **SHOULD** use a distinct
`session_id` of its own: the default session is the device's
persistent local state, and a remote peer writing into it collides
with the device owner's own interactions. In layer-2 systems,
remote use of `"default"` **SHOULD** be gated behind an
elevated-privilege grant (an "admin" grant or equivalent);
SESSION-1 itself places no requirement on the gate.

`"default"` is also the value a consumer fills in whenever
`session_id` is omitted (§2.1). This means an absent `session`, an
Expand All @@ -271,10 +279,26 @@ other (§3), distinguished only by its identifier.
### 3.2 Language signals

A session carries up to six BCP-47 language-tag fields, each
naming a different *kind* of language signal. All four are
session-scoped, all four are omissible per §2, and all four are
populated independently (typically by different stages of the
pipeline, by different components, or by an out-of-band caller).
naming a different *kind* of language signal — one purpose per
field, no overlaps:

| Field | The one thing it records |
|-------|--------------------------|
| `lang` | the user's stable input-side language preference (§3.2.1) |
| `secondary_langs` | the ordered fallback pool of additional languages the user accepts (§3.2.2) |
| `output_lang` | the language the user wants responses rendered in, when it differs from the input side (§3.2.3) |
| `stt_lang` | the language the speech-to-text stage assumed for the audio (§3.2.4) |
| `request_lang` | the emitter's per-utterance hint of the expected language (§3.2.5) |
| `detected_lang` | a language detector's classification of the most recent utterance (§3.2.6) |

All six are session-scoped, all six are omissible per §2, and all
six are populated independently (typically by different stages of
the pipeline, by different components, or by an out-of-band
caller). Preference (`lang`, `secondary_langs`, `output_lang`) is
declared by the session origin and stable; observation (`stt_lang`,
`request_lang`, `detected_lang`) is written per utterance by the
stage that made it and may disagree with the preferences and with
each other — disagreement is signal, not error.

Their **meanings** are normative; how a consumer **consolidates**
them into a single language for any given operation is not — that
Expand Down Expand Up @@ -390,7 +414,9 @@ Typical sources of `request_lang`:
- a **multi-wakeword** setup where each wake word is associated
with a language: the wakeword that triggered the capture
determines the reported hint (the user pressed an "English wake
word" so the emitter reports `en-US`);
word" so the emitter reports `en-US`). The detection itself is
observable as `ovos.listener.wakeword` (OVOS-AUDIO-IN-1 §6.5),
whose optional `lang` field carries the same binding;
- a UI lang selector the user toggled before speaking;
- a layer-2 router that knows the per-peer expected language.

Expand Down Expand Up @@ -522,6 +548,44 @@ specification places no maximum on session size.
Other specifications claiming session fields via §2.2 inherit
this rule for the fields they claim — they need not restate it.

Beyond per-field omission, OVOS-SESSION-2 §3.2 permits
intermediate/status emissions to carry a **thin** session of only
`{"session_id": ...}`; the full object is required only on dispatch
Messages and terminal lifecycle events.

### 3.5 User-preference fields

Four fields carry the session origin's **presentation preferences**,
so that a component answering a remote participant renders times,
dates, units, and place-relative answers for the *user's* locale
rather than the device's:

- `location` — object; the session origin's location preferences
(nested keys such as `city`, `coordinate`, and `timezone.code`,
e.g. `"America/Los_Angeles"`). An empty object is
wire-equivalent to omission.
- `system_unit` — string; measurement-system preference, e.g.
`"metric"` or `"imperial"`.
- `time_format` — string; time-rendering preference identifier,
e.g. `"full"` (24-hour clock) or `"half"` (12-hour clock).
- `date_format` — string; date-ordering preference identifier,
e.g. `"DMY"` or `"MDY"`.

All four follow §2.1: absence means the consumer falls back to the
deployment default. The §3.4 wire-weight rule applies: a producer
**SHOULD** omit a preference whose value matches the deployment
default.

**Deliberately unregistered: transient audio state.** This
specification does not register per-device transient audio-state
fields such as `is_speaking` / `is_recording` booleans. They
describe the device at an instant, not the session, and a value
snapshotted into a propagating session object is stale by the time
any consumer reads it. Implementations **MUST NOT** rely on such
fields if present on a session; the OVOS-AUDIO-1 output-lifecycle
signals are the authoritative surface for speaking status.
Consumers tolerate their presence under §2.4's unknown-field rule.

---

## 4. Propagation
Expand All @@ -546,11 +610,10 @@ For the avoidance of doubt:

### 4.1 Default materialization

OVOS-MSG-1 §4.1 permits an implementation to **materialize** a
default session on a derived Message when the source Message had no
`session`. That section permits "any device-local fields the
implementation chooses"; this specification narrows that permission
for the field set §3 claims. A materialized default **MUST** set `session_id: "default"`. A
The derivations of OVOS-MSG-1 §5.1–§5.2 permit an implementation
to **materialize** a default session on a derived Message when the
source Message had no `session`; this specification narrows that
permission for the field set §3 claims. A materialized default **MUST** set `session_id: "default"`. A
materialized default **MUST NOT** populate any field whose
deployment default is a deployment-configured or "no behaviour"
value — those fields carry meaning only when explicitly set by the
Expand Down