feat(audit): forensic audit log v2 (Goggles contract)#623
feat(audit): forensic audit log v2 (Goggles contract)#623erskingardner wants to merge 8 commits into
Conversation
Introduce the V2 forensic audit contract foundation: - AuditDataMode (obfuscated_sensitive_data default / full_data), stamped on every AuditEvent. - audit-log-event.v2.schema.json scaffold (v1 kinds carried forward plus the data-mode additions); bump AUDIT_LOG_SCHEMA_VERSION to v2. - recorder_started v2 shape (session id moves to the top-level field); new audit_data_mode_changed kind. - ForensicRecorder::set_data_mode rotates the backing store on a real mode change and writes a clear mode boundary; data_mode accessor. - Version v2 filenames (audit-<engine_id>-v2.jsonl) so existing v1 files are left untouched rather than appended to. Tests: schema<->Rust lockstep, all-variant serde round-trip, mode stamping (default + full-data open), rotation-with-boundary, no-op on unchanged mode, and v1-files-left-untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the audit data mode end-to-end so it can be toggled and persisted: - StoredAuditLogSettings.data_mode (TEXT) with an additive ALTER migration for pre-v2 databases; AuditLogSettings.data_mode (AuditDataMode) DTO and conversions; AuditLogSettingsFfi.data_mode + AuditDataModeFfi enum. - open_audit_recorder opens in the persisted mode. - MarmotAppRuntime::set_audit_log_settings hot-swaps a live recorder when the mode changes (engine/session set_audit_recorder_data_mode -> recorder set_data_mode), rotating the file with a clear audit_data_mode_changed boundary (requirement #4). New SetAuditDataMode worker command. - marmot-app re-exports AuditDataMode for FFI/CLI consumers. Tests: storage default/persist + legacy-column migration, app settings round-trip incl. data_mode, uniffi smoke round-trip + persistence, and a runtime integration test proving a live full_data toggle rotates the recorder with an entirely-full_data file and a boundary row. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Capture transport-layer identifiers so an analyzer can correlate engine activity with raw transport traffic (requirement #8): - forensics: AuditTransportWire reusable envelope (wire id/kind/pubkey, transport_group_id, relay url, subscription id, nostr event id/kind/ pubkey, gift-wrap/welcome ids, publish_result_id) + transport_received kind; wire attached to AuditTransportContext and to publish_attempt/ outcome/failure. Schema grows to match; lockstep + serde tests updated. - inbound: nostr adapter populates a generic TransportWireMetadata on TransportDeliverySource (h-tag transport group id read from the peeler-mapped envelope); session maps it to AuditTransportWire (mirroring wire_* into nostr_* for nostr); engine emits transport_received before ingest_entry, reusing the ingest payload digest. - outbound: account runtime stamps the available wire envelope (transport source + transport group id) on publish rows; post-wrap relay event id is produced in the adapter and left for later. Wire identifiers are transport-layer (e.g. ephemeral nostr pubkeys), never the author's account identity, so they are safe in both data modes. Tests: engine producer test (transport_received precedes ingest_entry with the wire fields); traits snapshots unchanged (optional fields skip when None); all producer crates green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Record who each outbound message is expected to reach, from authenticated membership at send time (requirement #9): - forensics: recipient_expectation kind + RecipientExpectation / MessageArtifactKind / RecipientScope; reshape send_outcome and create_group_outcome to an outbound_messages inventory ({msg_id, artifact_kind, transport?, recipient_expectation?}). Schema grows (recipientExpectation/outboundMessage/messageArtifactKind/ recipientScope/pubkeyHex); lockstep + serde samples updated. - engine: recipient_expectation_records computes, per outbound message, the expected recipients — group messages/commits target all OTHER current members (roster minus self via do_members + identity.self_id); welcomes target only the added member (from the welcome envelope recipient). Emitted as recipient_expectation rows after the outcome. Full member pubkeys are included only in full_data mode; member refs (salted hashes) + counts are always emitted. Tests: engine producer test proving a welcome scopes added_member_only and an app message scopes all_other_current_group_members, with no recipient pubkeys in the default obfuscated mode. Note: engine_ingest_buffers_future_epoch_app_message_as_convergence_witness is a pre-existing flake (fails ~1/3 multi-threaded on master too); unrelated to this change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Emit a rich, deterministic trace of distributed-convergence decisions (requirement #10): - convergence.rs: select_canonical_branch_traced returns a BranchSelectionTrace (per-candidate scores + eligibility, the rule-by-rule comparison between the winner and runner-up marking the decisive rule, and losing branches). The trace is a pure function of the candidate SET — candidates, losing ids, and per-candidate app_witnesses are ordered by id/(epoch,sender) so convergence stays input-order independent. - canonicalization.rs: CanonicalizationResult.selection_trace carries it out. - distributed_convergence.rs: a salted, per-run convergence run_id; emits convergence_run_state lifecycle rows (started/waiting/blocked/unrecoverable/ applied/stable) plus the reshaped convergence_decision, all correlated by the run_id on a new convergence audit context. - forensics: AuditConvergenceContext, ConvergencePhase, ConvergenceCandidate/ Score/AppWitness/RuleEvaluation, convergence_run_state kind, reshaped convergence_decision (candidates + rule_trace + losing branches). Full committer/witness pubkeys are full-data only; refs/digests always emitted. Tests: traced-selection unit test (decisive rule, candidates, losers, eligibility), engine producer test (run_state + decision share a run_id), and the order-independence canonicalization proptest (now green with the sorted trace). Schema lockstep + serde + conformance suite all pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`engine_ingest_buffers_future_epoch_app_message_as_convergence_witness` failed ~1/3 of multi-threaded runs (passed single-threaded / in isolation; reproduced on master, pre-existing). Root cause: `carol.ingest(...)` buffers the future-epoch messages and stamps `last_convergence_relevant_input_ms` with the engine's real monotonic clock (`convergence_now_ms`). The test then converged with a logical `now_ms` of 2_000, and settlement requires `now_ms - last_input >= settlement_quiescence_ms` (1_000). Under parallel load the real elapsed time from engine creation to the buffering ingest exceeds ~1s, so `last_input > 1_000` and `2_000 - last_input` no longer clears quiescence -> ConvergenceStatus::Resolving instead of Settled. Fix: converge with a logical `now_ms` far past the quiescence window (1_000_000), matching the ~20 other ingest-then-converge tests in this file, so the settle is independent of real elapsed time. Verified with 10 multi-threaded runs of the full file (0 failures). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Final audit-v2 phase — surface decrypted content under explicit full_data opt-in (requirements #6, #7) and finish the obfuscated/full-data contract: - forensics: message_content_decoded kind (+ MessageAuthor / DecodedPayload / DecodedApplicationEvent / AttachmentMetadata), source_context kind (+ AuditSourceContext, also on the audit context), and GroupStateChanged reshaped to a `value` object (digest/len always; text/json/pubkeys full-data) plus actor/subject pubkeys. Schema grows to match and gains the obfuscated `allOf` guards that forbid every full-data-only field (decoded content, account/actor/subject/committer/witness/recipient pubkeys, cleartext group-state values) when audit_data_mode is obfuscated_sensitive_data. - engine: on a successful application-message decrypt, full_data mode decodes the MarmotAppEvent and emits message_content_decoded (author member ref + full pubkey, decoded kind/content/tags/created_at, NIP-94 imeta attachments); obfuscated mode never decodes. GroupStateChanged carries the value object + actor/subject pubkeys gated on full_data. - marmot-app: open_audit_recorder emits a source_context row (account_label always; account pubkey full-data only). Tests: forensics serde+lockstep over all new kinds; engine full-data-positive (decoded content + author pubkey present, every line full_data) and obfuscated-negative (ingest recorded, no decoded content) producer tests. fmt + clippy clean; engine, app, storage, uniffi audit suites green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Goggles merged the audit-log v2 PRD; adopt its schema verbatim (darkmatter's copy is now byte-identical) and reconcile the Rust model + producers. Schema/model deltas: - transportWireEnvelope: wire_kind is now a string (the numeric Nostr kind stays on nostr_kind); welcome_event_id replaced by welcome_nostr_event_id, welcome_rumor_event_id, welcome_key_package_tag. - group_state_changed: new membership_change_source (self_leave / admin_action / convergence / remote_commit / unknown), derived from the change + actor; change_kind gains topic_changed. - convergence_decision: candidates now required (always serialized); convergenceCandidate gains state_digest + last_input_time_ms. - added optional fields: artifact_kind on publish_attempt/outcome/failure, message_state_changed, peeler_outcome; relay_url on publish_*; detail + required_acks on publish_failure; origin_commit_id on epoch_confirmed; state_digest on snapshot_created. - pattern tightenings (msg_id->messageId/digestHex, *_digest->digestHex, wire/nostr pubkeys->pubkeyHex): no behavioral change — real Nostr ids, payload digests, and pubkeys are already 64-hex (only synthetic test ids were shorter). Producers updated in cgka-engine (audit_helpers + ingest + engine), cgka-session (wire mapping: wire_kind->string), marmot-account (publish rows: artifact_kind, relay_url, detail, required_acks), and transport-nostr-adapter / traits (drop the unused welcome_event_id carrier field). State-digest and per-candidate last_input are left unset (cost / not per-candidate) and noted inline. Tests: schema<->Rust lockstep + all-variant serde round-trip updated; new recursive conformance test proves serialized output uses only schema-allowed keys (guards additionalProperties:false). fast-ci, engine, account, conformance, app, and uniffi suites all green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 38 minutes and 36 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (38)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Ready to review this PR? Stage has broken it down into 9 individual chapters for you: Chapters generated by Stage for commit aa368ae on Jun 25, 2026 8:18pm UTC. |
Implements the full Marmot forensic audit-log v2 contract end-to-end across darkmatter, matching the schema merged on Goggles
master. Builds in six reviewable phases plus a flaky-test fix and a final schema-sync.What changed
marmot-forensics):AuditDataMode(obfuscated_sensitive_datadefault /full_data) stamped on every event; v2 schema;recorder_startedreshaped;audit_data_mode_changedkind;ForensicRecorder::set_data_moderotates the file on a real mode change; versionedaudit-<engine_id>-v2.jsonlfilenames so existing v1 files are left untouched.data_modethroughStoredAuditLogSettings(additive column migration),AuditLogSettings, andAuditLogSettingsFfi/AuditDataModeFfi; recorder opens in the persisted mode;MarmotAppRuntime::set_audit_log_settingshot-swaps a live recorder with a clear boundary when the mode changes.transport_receivedkind + a reusableAuditTransportWireenvelope on inbound (transport_received/context) and outbound (publish_*) rows. Inbound wire data sourced from the Nostr adapter; emitted by the engine before ingest.recipient_expectationrows;send_outcome/create_group_outcomecarry anoutbound_messagesinventory. Group messages/commits target all other current members; welcomes target only the added member. Full recipient pubkeys are full-data only.convergence_run_id,convergence_run_statelifecycle rows, and a reshapedconvergence_decisioncarrying every candidate + score, arule_tracerecording each selector rule and the decisive one, and losing branches. The trace is a pure function of the candidate set (order-independent).message_content_decoded(decoded app event, author identity, NIP-94 attachments) emitted at engine ingest, strictly gated onfull_data;source_context;group_state_changedvalue object + actor/subject pubkeys; schemaallOfguards forbidding every full-data-only field in obfuscated mode.wire_kindis now a string, thewelcome_*wire fields split,membership_change_sourceadded,convergence_decision.candidatesrequired, plus several optional fields (artifact_kind,relay_url,detail,origin_commit_id,state_digest,last_input_time_ms) and pattern tightenings.engine_ingest_buffers_future_epoch_app_message_as_convergence_witness(pre-existing on master; mixed the engine's real monotonic convergence clock with a logicalnow_ms).Privacy posture
Obfuscated mode (the default) never logs plaintext, decoded content, full author/recipient/group-state values, or account pubkeys; full-data is an explicit opt-in. Neither mode ever logs bearer/upload tokens, auth headers, private keys, ciphertext, or raw MLS bytes.
Reviewer notes
additionalProperties: false), since no JSON-Schema validator crate is available offline.messageId/digestHextightening is non-breaking: real Nostr ids, payload digests, and pubkeys are already 64-hex (only synthetic test fixtures were shorter).convergenceCandidate.state_digestand per-candidatelast_input_time_msare intentionally left unset (cost / not per-candidate), noted inline.Verification
just fast-cigreen workspace-wide (incl. OTLP feature builds); fmt + clippy clean;marmot-forensics,cgka-engine,cgka-traits(snapshots unchanged),cgka-session,marmot-account,cgka-conformance-simulator,marmot-app, andmarmot-uniffiaudit suites all green. Rebased on latestorigin/master.🤖 Generated with Claude Code