diff --git a/.changeset/idmatch-impression-tracker-impl-reference.md b/.changeset/idmatch-impression-tracker-impl-reference.md new file mode 100644 index 0000000000..b715467ed2 --- /dev/null +++ b/.changeset/idmatch-impression-tracker-impl-reference.md @@ -0,0 +1,6 @@ +--- +--- + +Add `docs/trusted-match/impression-tracker-implementation.mdx` — non-normative implementation reference for the impression tracker that sits behind the cap-fire boundary contract. Covers cross-identity dedup via `impression_id`, the `fcap_keys` label model, the log-based reference data model from `adcp-go/targeting/`, SDK primitives (`decodeTmpx` + `writeExposure`), production topology, and two end-to-end conformance scenarios (multi-identity dedup and cross-seller advertiser cap). Cross-links from `identity-match-implementation.mdx` so readers can find it. + +This re-introduces, as non-normative impl reference, the impression-tracker mechanics that were originally proposed as normative architecture in `bokelley/idmatch-design` but were superseded on `main` by the narrower cap-fire boundary contract (#4070). The boundary contract stays normative; this page documents one valid way to implement the impression tracker behind it. diff --git a/docs/trusted-match/identity-match-implementation.mdx b/docs/trusted-match/identity-match-implementation.mdx index d333d80162..1f1c36c37d 100644 --- a/docs/trusted-match/identity-match-implementation.mdx +++ b/docs/trusted-match/identity-match-implementation.mdx @@ -109,6 +109,7 @@ Today the cap-state store is keyed at `(user_identity, seller_agent_url, package ## See also - [TMP Specification](/docs/trusted-match/specification) — wire spec, TMPX format, conformance invariants +- [Impression Tracker Implementation Reference](/docs/trusted-match/impression-tracker-implementation) — non-normative reference for the impression-tracker side of the boundary (multi-identity dedup via `impression_id`, fcap_keys label model, log-based reference data model, SDK primitives) - [Buyer Guide](/docs/trusted-match/buyer-guide) — buyer agent integration, Context Match + Identity Match flows - [Migration from AXE](/docs/trusted-match/migration-from-axe) — for buyers transitioning from AXE-shaped pipelines, including the OpenRTB User.eids cross-walk - [Privacy architecture](/docs/trusted-match/privacy-architecture) — what each party learns diff --git a/docs/trusted-match/impression-tracker-implementation.mdx b/docs/trusted-match/impression-tracker-implementation.mdx new file mode 100644 index 0000000000..d5c6f7e66b --- /dev/null +++ b/docs/trusted-match/impression-tracker-implementation.mdx @@ -0,0 +1,280 @@ +--- +title: Impression Tracker Implementation Reference +sidebarTitle: Impression Tracker Reference +description: "Non-normative reference for the buyer-internal impression tracker — multi-identity dedup, fcap_keys label model, and the path from an impression pixel to a cap-fire entry at the Identity Match boundary." +"og:title": "AdCP TMP Impression Tracker Implementation Reference" +--- + +# Impression Tracker Implementation Reference + +This page is **non-normative reference content** for the impression tracker that sits behind the [Frequency-Cap Data Flow](/docs/trusted-match/identity-match-implementation) boundary. The protocol only constrains: + +- The wire spec — see the [TMP specification](/docs/trusted-match/specification). +- The conformance invariants the Identity Match service must satisfy — also normative in the [TMP specification](/docs/trusted-match/specification#conformance-invariants-for-identitymatch-eligibility). +- The cap-fire boundary contract — defined in [Frequency-Cap Data Flow](/docs/trusted-match/identity-match-implementation). + +Everything on this page is buyer-internal: how the impression tracker counts impressions, deduplicates across resolved identities, evaluates windows, and decides when a cap fires. Buyers running a conformant impression tracker may pick any approach that produces correct cap-fire events at the boundary. This page documents one such approach — the one implemented in [`adcp-go/targeting`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting) — so other implementers have a worked reference. + +## The cross-identity dedup problem + +A single impression on a user is often resolved to multiple identities (RampID, ID5, MAID, UID2, publisher-issued tokens, etc.) inside the same TMPX. A naive impression tracker that counts per-identity will count one impression as 2–3 against the user's caps. If the buyer runs an identity graph, the buyer can canonicalize identities before counting; if the buyer is graphless or partially graphed (common — Scope3's hosted Identity Match is graphless), no canonical id exists. + +Counter-based approaches paper over this with a `merge_rule` (MAX / OR / SUM) when reading per-identity counters. None of the merge rules is correct in general. The pathological case is identity-resolution toggling across impressions: some impressions resolve `rampid` only, some resolve both `rampid` and `id5`. A MAX-merged counter under-counts; SUM over-counts; OR can't represent more-than-one. The cap fires at the wrong time either way. + +The reference impl avoids the merge-rule problem entirely with an `impression_id` scheme: one id per impression, written to every resolved identity's log, deduplicated by id at read time. The count is exact regardless of whether identities are canonicalized upstream. + +## impression_id rules + +The impression tracker generates one `impression_id` per impression at TMPX decode time and writes it to every resolved identity's log. At read time, scanning all of a user's identity logs and deduplicating by `impression_id` recovers the distinct-impression count exactly. + +Required properties: + +1. **Globally unique across all sellers, sources, and time.** A buyer agent serves impressions sourced from many sellers. Collisions across sellers would silently merge distinct impressions and under-count the cap. Use UUIDv4 (≥122 bits randomness) or an equivalent collision-resistant generator. +2. **Generated by the buyer's impression tracker at TMPX decode** — not by the seller, the publisher, the router, or the TMPX nonce. The TMPX nonce is per-Identity-Match-evaluation and shared across all impressions in the serve window; seller- or publisher-supplied IDs would collide. +3. **One id per impression, written to ALL of the user's resolved identity logs for that impression.** Generating a different id per identity breaks the dedup contract — the same impression would count once per resolved identity. +4. **Pixel retries are a separate concern.** The same pixel firing twice (network retry, page refresh, etc.) MUST NOT mint two `impression_id`s. Either dedupe incoming requests by an idempotency key in the pixel URL or `Idempotency-Key` header, or accept a small over-count from retries as benign for fcap purposes. Cross-identity dedup and per-pixel idempotency are different problems with different mitigations. + +## fcap_keys label model + +Caps are tagged with `dimension:value` labels at impression-write time. Packages declare which labels they map to; fcap policies attach `(window_sec, max_count)` to each label. + +``` +package 2342: fcap_keys ["campaign:42", "campaign_group:7", "advertiser:13"] +policy "campaign:42": {window_sec: 60, max_count: 5} +policy "campaign_group:7": {window_sec: 86400, max_count: 50} +policy "advertiser:13": {window_sec: 86400, max_count: 20} +``` + +When the impression tracker writes an exposure for an impression on package 2342, the entry's `fcap_keys` is `["campaign:42", "campaign_group:7", "advertiser:13"]`. When evaluating whether a cap has fired, it scans the log for entries matching each label within that policy's window. + +**Charset constraint.** Each segment matches `[a-zA-Z0-9_-]+` so the `:` delimiter is unambiguous. URL-bearing or otherwise colon-bearing values must be hashed or shortened. + +**Multi-tenant operators** typically adopt a tenant prefix (`buyer-acme:campaign:42`) as a deployment convention to prevent key collisions across advertiser orgs on shared state. This is operator policy, not protocol. + +**Why labels, not hierarchy.** Cap dimensions are heterogeneous across customers — some cap at creative, some at line item, some at advertiser-roll-up. A fixed schema either over-prescribes or under-serves. Labels also make cross-seller caps automatic: any policy whose key is shared across sellers (e.g., `buyer-acme:advertiser:13`) enforces across all of them with no extra mode. Cross-cutting policies are explicit — a campaign that needs both per-campaign and per-advertiser caps declares both keys and gets two policy lookups. + +## Reference data model (valkey-backed, log-based) + +The layout below is what [`adcp-go/targeting`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting) uses. Any backend (Aerospike, DynamoDB, in-memory, anything) is fine; the data shape is the reference, not a requirement. + +### Exposure log (per identity) + +``` +type: STRING (binary-encoded []ExposureEntry, lazy-pruned to window) +key: user:exposures:{HashToken(uid_type + ":" + user_token)} +value: [ + { impression_id, fcap_keys[], timestamp }, + ... +] +``` + +`HashToken` is a 16-byte SHA-256 prefix, hex-encoded. Binary entry encoding keeps the log compact ([`exposure_binary.go`](https://github.com/adcontextprotocol/adcp-go/blob/main/targeting/exposure_binary.go)) — a 30-day log for a typical user is a few KB. + +Each entry records: + +- `impression_id` — generated at TMPX decode. Same value across all of this impression's identity logs. +- `fcap_keys[]` — the labels this impression counts toward. +- `timestamp` — unix seconds. + +### Fcap policy (per fcap_key) + +``` +type: STRING (JSON-encoded FcapPolicy) +key: fcap_policy:{fcap_key} +value: { window_sec, max_count, active, updated_at } +``` + +Sliding window applied at read by filtering `timestamp >= now - window_sec`. No FIXED/SLIDING toggle. + +### Package configuration (per package) + +``` +type: STRING (JSON-encoded PackageConfig) +key: package:identity:{package_id} +value: { + fcap_keys: ["campaign:42", "advertiser:13"], + active: true, + updated_at: +} +``` + +Maps package → fcap_keys. The impression tracker reads this to figure out which labels to tag a new exposure with. + +## Write path: pixel → log + +On a TMPX-bearing pixel fire, the impression tracker: + +1. Decodes the TMPX (HPKE decrypt + binary parse) → resolved identities + `(seller_agent_url, package_id)` context. +2. Looks up the package's `fcap_keys`. +3. Generates one `impression_id`. +4. For each resolved identity, appends `{impression_id, fcap_keys, timestamp}` to `user:exposures:{hash(identity)}`. Prunes entries older than the longest active window (default 30 days). + +The read-modify-write per identity is not atomic in the reference impl ([`engine.go:478`](https://github.com/adcontextprotocol/adcp-go/blob/main/targeting/engine.go#L478)) — concurrent writes for the same user can lose an exposure. The reference impl explicitly accepts this; under-counting under contention is benign for fcap purposes. Atomic append via Lua or a `Store.Append` extension is a deferred optimization. + +## Evaluating whether this impression exhausted a cap + +After writing the exposure, the impression tracker decides whether any cap just fired. For each `fcap_key` on the exposure, it scans the user's identity logs: + +1. Read `user:exposures:{h}` for every resolved identity. +2. Filter entries to `timestamp >= now - policy.window_sec` and `fcap_key ∈ entry.fcap_keys`. +3. Deduplicate by `impression_id` across all the user's identity logs. +4. Compare the deduped count to `policy.max_count`. + +If the deduped count is `>= max_count`, the cap fired on this impression. The impression tracker then writes a cap-fire entry to the Identity Match cap-state store for every `(user_identity, package_id)` whose package maps to the exhausted `fcap_key`. The expiration is `now + remaining_window`, where `remaining_window` is the window of the oldest deduped exposure still in scope. + +For a cap on an advertiser-level label (`advertiser:13`) that maps to multiple packages on multiple sellers, the impression tracker emits one cap-fire entry per `(user_identity, seller_agent_url, package_id)` affected — main's [boundary contract](/docs/trusted-match/identity-match-implementation#the-cap-fire-event) is package-scoped, so cross-dimensional caps fan out at write time. + +## SDK primitives + +The SDK ships impression handling as two composable functions, not one bundled call. Production tracking endpoints typically decode at intake and let a downstream worker write the store at its own pace; bundling decode+write into a single function would force synchronous topology and prevent buffering. + +``` +decodeTmpx(raw_tmpx) -> ExposureLog + Decrypts HPKE ciphertext, parses the published TMPX binary format + (/docs/trusted-match/specification#binary-format), returns the resolved + identity entries in a structured form ready for serialization onto a + topic or for direct write. + +writeExposure(log, fcap_keys, store_context) -> { ok, fired_caps } + Appends entries to each identity's exposure log with a fresh impression_id + and the supplied fcap_keys. Prunes entries older than the longest active + window. Returns the set of caps that fired on this impression — the + caller fans these out to the Identity Match cap-state store. +``` + +Plus the buyer-side management plane: + +``` +upsertPackage(seller_agent_url, package_id, fcap_keys, opts) +upsertFcapPolicy(fcap_key, {window_sec, max_count}) +inspectExposures(uid_type, user_token, fcap_key?) // debugging helper +``` + +Plus HPKE encrypt/decrypt as net-new SDK primitives (X25519 KEM, ChaCha20-Poly1305, HKDF-SHA256 per RFC 9180 `mode_base`). Encrypt is needed by the Identity Match service emitting TMPX; decrypt by the impression tracker invoking `decodeTmpx`. + +The same surface ships in `@adcp/client` (TS), `adcp-go`, and `adcp` (Python). + +## Production topology pattern + +A typical Scope3-style deployment: + +``` +publisher pixel fires {TMPX} → tracking endpoint + │ + decodeTmpx (synchronous, at intake) + │ + ▼ + pub/sub topic + │ + frequency_writer worker + │ + writeExposure (asynchronous) + │ + ▼ + valkey (exposure log) + │ + if cap fired → RecordCap to + Identity Match cap-state store +``` + +Decode at intake; emit to pub/sub for buffering; downstream worker writes the exposure log and emits any cap-fire events. Buffering, retries, dedup, observability, and abuse protection live at the queue layer — none of that is the SDK's job. A simpler synchronous pipeline (decode + write in the same handler) is also valid for low-volume deployments. + +## Conformance scenarios + +These walk through impression-tracker behavior end-to-end. They are buyer-internal mechanics; the on-wire observable is whatever cap-fire entries land in the Identity Match cap-state store, which surfaces as eligibility decisions in later `identity_match_request` calls. + +Setup for both scenarios: `package = "pkg-42"` on `seller-a.example`, `fcap_keys: ["campaign:42"]`, `policy campaign:42 = {window_sec: 86400, max_count: 5}`. + +### Scenario A — multi-identity dedup + +User has two resolved identities: `rampid:abc` and `id5:def`. + +**Three impressions, each TMPX resolves both identities.** Each impression writes the same `impression_id` to both identity logs: + +``` +user:exposures: = [ + { impression_id: "imp-001", fcap_keys: ["campaign:42"], ts: ... }, + { impression_id: "imp-002", fcap_keys: ["campaign:42"], ts: ... }, + { impression_id: "imp-003", fcap_keys: ["campaign:42"], ts: ... } +] +user:exposures: = [ same three entries ] +``` + +At the third write, the impression tracker checks: union both logs, dedupe by `impression_id` → 3 distinct impressions. Under cap of 5 → no cap-fire entry emitted. + +**Three more impressions, only `rampid:abc` resolves (id5 lookup fails).** Logs after the 6th impression: + +``` +user:exposures: += [ imp-004, imp-005, imp-006 ] +user:exposures: unchanged +``` + +At write of imp-005 (the 5th distinct impression), the deduped count is 5 = `max_count` → the cap just exhausted. The impression tracker emits a cap-fire entry to the Identity Match cap-state store for both identities: + +``` +RecordCap(rampid:abc, [{seller-a.example, pkg-42}], expire_at) +RecordCap(id5:def, [{seller-a.example, pkg-42}], expire_at) +``` + +A counter-based tracker with MAX merge_rule would have counted `max(rampid, id5) = max(6, 3) = 6` only after imp-006, and would have over-served by one impression — or under-counted in the reverse pathological case. The log + `impression_id` dedup gets the count right regardless of identity-resolution stability. + +### Scenario B — cross-seller advertiser cap + +Two packages on different sellers, both mapped to the same advertiser-level label: + +``` +package:identity:pkg-A = { fcap_keys: ["advertiser:13"], active: true } // seller-a +package:identity:pkg-B = { fcap_keys: ["advertiser:13"], active: true } // seller-b +fcap_policy:advertiser:13 = { window_sec: 86400, max_count: 10 } +``` + +Ten impressions on `pkg-A` from `seller-a`. Each exposure entry's `fcap_keys` includes `advertiser:13`. At the 10th write, the deduped count for `advertiser:13` matches `max_count`. The impression tracker emits cap-fire entries for **every package mapped to `advertiser:13` across all sellers**, for every resolved identity: + +``` +RecordCap(, [ + {seller-a.example, pkg-A}, + {seller-b.example, pkg-B}, +], expire_at) +``` + +A subsequent `identity_match_request` from `seller-b` for `pkg-B` returns `eligible_package_ids: []` because the cap-state entry is present. The advertiser-level cap enforces across sellers because the `fcap_key` is shared. No cross-seller coordination is required at the IdentityMatch service — the buyer agent's impression tracker is the single source of truth, and the cap-state store is the publication channel. + +## Performance reference + +Numbers below are from [`targeting/scale_test.go`](https://github.com/adcontextprotocol/adcp-go/blob/main/targeting/scale_test.go) against the in-memory mock store, single goroutine. They isolate CPU from network. They describe the **impression tracker's** evaluation cost — the cost of scanning logs and deciding whether this impression just fired a cap. The Identity Match service's at-query-time cost is a separate, much smaller cap-state presence check. + +**Per-eval at write time, varying log size, single identity, single fcap_key:** + +| Prior exposures in user's log | Eval latency | +|---|---| +| 0 | 368 ns | +| 100 | 5.3 µs | +| 1,000 | 53 µs | +| 10,000 | 118 µs | + +Linear scan with binary lazy dedup; sub-millisecond at 10K entries. + +**Combined load (multi-identity, multi-package eval), varying all dimensions:** + +| packages mapped via fcap_keys | log entries / id | identities | CPU/eval | +|---|---|---|---| +| 100 | 1,000 | 3 | 1.0 ms | +| 1,000 | 1,000 | 3 | 7.5 ms ← realistic Scope3-shape load | +| 1,000 | 10,000 | 3 | 58 ms ← pathological tail (heavy users) | + +CPU scales in `packages × log_entries × identities`. The pathological tail is addressed by the algorithmic optimization in [adcp-go#103](https://github.com/adcontextprotocol/adcp-go/pull/103) (heuristic-gated prefilter bucket; gated at `numPackages > 50` to avoid regressions on small requests): + +| packages | log entries | identities | Before | After | Speedup | +|----------|------------:|-----------:|----------:|---------:|--------:| +| 1,000 | 100 | 3 | 784 µs | 71 µs | 11.0× | +| 1,000 | 1,000 | 3 | 7,566 µs | 287 µs | 26.4× | +| 1,000 | 10,000 | 3 | 57,861 µs | 1,500 µs | ~38× | + +Production sizing also depends on valkey round-trip latency, tail behavior under load, and the heavy-user impression-distribution shape. Mock-store CPU is the floor, not the production number. + +## See also + +- [Frequency-Cap Data Flow](/docs/trusted-match/identity-match-implementation) — the cap-fire boundary contract this page sits behind +- [TMP Specification](/docs/trusted-match/specification) — wire spec, conformance invariants +- [`adcp-go/targeting`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting) — reference Go implementation of the model on this page +- [`adcp-go/targeting/fcap`](https://github.com/adcontextprotocol/adcp-go/tree/main/targeting/fcap) — reference cap-state store on the other side of the boundary