You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue reflects the development branch. Not a released version yet.
Note
ID convention: Two-letter prefixes (FR-, SR-, PR-, TR-, PE-, DR-, UX-) reference formal requirements in PRD.md. Single-letter prefixes (F- for findings, D- for deferred) are temporary IDs local to this audit.
Non-hermetic history tests fail under ambient git config (F-032)
Production-Readiness Verdict:
Suitable for v0.6.0 release for local/Ollama-centric workflows.
Address F-028, F-030, and F-032 before v1.0 or enterprise distribution.
Audit Dimension Summary
Dimension
Status
Key Concern
1. Language & Modern Rust
🟢 Good
Minor strip_prefix opportunities
2. Async & Concurrency
🟡 Needs Attention
Sync blocking in hook/clipboard paths
3. Security & Safety
🟡 Needs Attention
forbid(unsafe) gap, Ollama URL validation
4. Error Handling
🟢 Good
thiserror + miette, solid diagnostics
5. CLI Design
🟢 Good
Config cascade correct; doctor misleading for cloud
6. Performance
🟡 Needs Attention
Binary size 25MB > 15MB target
7. Testing & CI
🟡 Needs Attention
Non-hermetic tests, Ubuntu-only CI
8. Compliance
🟢 Good
ANSSI-FR mostly aligned, SPDX/REUSE compliant
Immediate Priority
Quick wins that should ship soon. All verified against source code.
F-001: Add overflow-checks = true to [profile.release] — unanimous across all reviewers, ANSSI-FR aligned. Cargo.toml:131-134
F-011: Replace l[1..] byte indexing with strip_prefix in context.rs and splitter.rs (grep for &line[1..] and &l[1..])
F-015: Add // SAFETY: comments to #[allow(clippy::cast_possible_truncation)] in analyzer.rs and differ.rs (multiple sites — grep for cast_possible_truncation)
F-028: Add #![forbid(unsafe_code)] to main.rs — binary crate re-declares modules via mod instead of importing from lib.rs, bypassing the safety attribute. Highest-impact finding (single-reviewer catch, confirmed by synthesizer). main.rs:8
Security & Safety
F-004: API keys stored as plain Option<String>, never zeroed from memory, inherited by child processes. Consider secrecy::SecretString.
F-009: Project config values briefly exist in Config struct before security overwrite. Single synchronous function, no concurrent access — low risk but could be cleaner. (synthesizer downgraded to INFO)
F-010: Cloud provider constructors default to empty API key via unwrap_or_default(). Normal flow validates key beforehand, but defense-in-depth assertion would help. (synthesizer downgraded to INFO)
F-019: --allow-secrets bypass needs risk documentation in CLI help text.
F-030: Ollama URL validation is prefix-only (starts_with("http://")) with no loopback enforcement. User config or env vars can redirect diff traffic to arbitrary remote endpoints. PRD SR-001 specifies loopback-only resolution — not implemented.
Async & Concurrency
F-002: Synchronous std::process::Command blocking tokio runtime in hook_dir() and copy_to_clipboard(). Use tokio::process::Command or spawn_blocking.
F-003: Blocking std::io::stdin().read_line() on async thread during secret confirmation. Fail-closed design mitigates risk.
F-020: Unbounded concurrent git show process spawning — no semaphore. 50 staged files = 100 processes.
F-025: Orphaned Ctrl+C handler task with .ok() silently discarding signal registration errors.
Testing & CI
D-037: Deleted/Renamed ChangeStatus variants never used in test fixtures
D-039: classify_span_change None path untested
D-040: make_symbol test helpers hardcode line:1, end_line:10 — partially addressed
D-047: No fuzz testing for non-Rust signature extraction — fuzz_signature only tests Rust parser
F-007: MSRV "verification" in CI uses the same toolchain as all jobs — not actually testing the floor. Add separate stable job.
F-016: CI only tests on Ubuntu. Platform-specific code (Unix permissions, macOS/Windows clipboard, hook paths) untested. Add macOS runner at minimum.
F-017: No cargo deny in CI. PRD SR-005 specifies it for license compliance. Currently only cargo audit.
F-018: No clippy.toml — would catch sync Command usage (F-002) and stray dbg!/println! at lint time via disallowed-methods/disallowed-macros
F-023: No release automation workflow. PRD DR-002 specifies GitHub Releases via cargo-dist.
F-032 / D-007: History tests non-hermetic — analyze_repo_with_enough_commits and analyze_respects_sample_size fail under ambient git config (GPG signing, hooks). Set GIT_CONFIG_NOSYSTEM=1, GIT_CONFIG_GLOBAL=/dev/null, HOME to tempdir. Verified by reviewer actually executing cargo test.
TR-008: Layer 2 LLM quality testing missing — eval harness tests pre-LLM pipeline but never validates actual commit message quality against a real or mocked LLM
Performance & Binary Size
D-074: Targeted hunk parsing optimization — scan @@ headers instead of full lines().
F-012: builtin_patterns() recompiles 24 regexes per build_patterns() call when custom/disabled patterns are configured. Cache base patterns separately.
F-021: classify_span_change allocates two full Strings for whitespace comparison instead of streaming Iterator::eq(). Also applies to new classify_span_change_rich() and AstDiffer::bodies_semantically_equal().
F-022: Secret scanning could use RegexSet for batch first-pass matching instead of 24 serial regex per line.
PE-007: Token-aware truncation — replace char-based budget (~4:1 approximation) with actual BPE/tiktoken counting for accurate context window utilization.
Semantic Analysis
D-002: Adaptive symbol budget — Partially addressed. Budget now adapts: 20% with structural diffs, 30% with signatures only, 20% base. Full count-based scaling not yet implemented.
D-003: Allow .scm query file changes if body-node extraction fails for a language
D-005: Smart signature truncation — preserve return type at end instead of simple 200-char cap
D-060: Go import block parsing — multiline import () blocks not handled by line-by-line detection
D-061: Error handling regex edge cases — ? at line end and Result< match in strings/comments. Also: detect_intents() threshold of >2 matches may produce false positives in large diffs with common Rust patterns.
D-062: Test correlation pattern gaps — test_foo.rs, nested test dirs not matched by stem-based matching
D-064: Lifetime/generic bound changes not detected
D-065: BodyUnchanged redundant when other changes exist — omit to save tokens
D-066: Combined token budget unanalyzed — Partially addressed (FR-070). Symbol/diff budget rebalanced when structural diffs are present. A comprehensive budget simulation across all new prompt sections (STRUCTURED CHANGES, IMPORTS CHANGED, RELATED FILES, INTENT) together has not been done.
D-068: Import alias handling — use foo as bar alias not captured
FR-073: Structural fingerprinting / move detection — AST structure hashes to detect function moves with zero semantic changes.
Splitter & Message Quality
D-067: Splitter synergy vague — need specific merge threshold for SymbolDiff integration
D-069: Internal reranking for small models — sample 2 candidates, score deterministically
D-071: Import/dependency scanning — use crate::...::file_B_stem for stronger dependency edges
D-072: LLM-assisted splitting — 2-stage classify+generate as tie-breaker
D-075: New-import-as-feat signal — detect "new import of types not previously used" as feat indicator
FR-074: Symbol Dependency Graph — replace hardcoded path heuristics in splitter with actual AST-derived dependencies
FR-075: Configurable file categorization — custom patterns in config for proprietary build systems
CLI & UX
D-016: Diff truncated mid-line in --dry-run output — display/streaming issue
D-017: Commit message printed twice in --dry-run output
F-029: commitbee doctor doesn't verify cloud provider connectivity — only checks API key presence (is_some()). Ollama gets a real health check but OpenAI/Anthropic don't. Call provider.verify().await.
F-034: Terminal progress lacks ASCII fallback for TERM=dumb or limited terminals. PRD UX-002 specifies this.
Observability
F-013: No #[instrument] spans on key pipeline functions (generate_commit, extract_symbols, build). Would improve debugging and timing visibility.
F-024: Test count documentation drift — Fixed in v0.6.0. All docs now say 424.
Code Quality & Modernization
D-028: HARD LIMIT section in user prompt duplicates system prompt instructions
D-032: CodeSymbol all-pub fields without #[non_exhaustive] — only matters if publishing as library
D-034: PromptContext all-pub fields without #[non_exhaustive] — same caveat
F-005: reqwest TLS backend not explicitly pinned. Synthesizer notes: reqwest 0.13 uses platform-native TLS by default. Pin to rustls-tls if pure-Rust stack desired, but native TLS is acceptable.
F-027: Lossy float-to-integer cast in history.rs:49 — use .round() as u32
F-028 (extended): Refactor main.rs to consume library crate instead of re-declaring module tree
Architecture
D-052: Migrate git show calls to gix object reading — reduce process overhead on large repos
D-055: Architecture plugin trait for AnalyzerService — trait-based per-language isolation
D-056: Evidence "Why" hook — hidden rationale field for debugging incorrect inference
D-057: Contextual sibling API display — include unchanged public functions in same scope for context
F-031: Non-atomic split commits — TOCTOU window between unstage/re-stage with no rollback. Documented known issue.
Compliance
Framework
Applies?
Current Status
Gap
ANSSI-FR Secure Rust
Advisory
Mostly compliant
overflow-checks (F-001), binary forbid(unsafe) gap (F-028)
SPDX/REUSE
Yes
Compliant
None
Open Source Licensing
Yes
Compliant — all deps MIT/Apache-2.0
None
EU AI Act
Minimal risk
Compliant
None
GDPR/CCPA
Marginally
Adequate mitigations
PII in diffs is transient
EU Cyber Resilience Act
Commercial only
Not started
SBOM generation needed by Dec 2027
Dependency Health
All 30 direct dependencies clean. No CVEs. Full audit by cargo audit (433 crate graph).
Notable
Version
Note
reqwest
0.13.2
Platform-native TLS default
gix
0.80.0
Historical gix-transport advisory patched
tokio
1.50.0
Named-pipe advisory patched
tree-sitter
0.26.7
Clean
Methodology
Audit process details
Dialectic Verification Pipeline
The audit uses a multi-stage dialectic verification process designed to prevent anchoring bias:
Parallel independent review — Multiple AI models (3-5 per run) analyze the target simultaneously. Each reviewer works in isolation with no access to other reviewers' findings. Models are drawn from different families to maximize perspective diversity.
Deduplication — An orchestrator merges overlapping findings, flags conflicts, and normalizes severity.
Independent synthesis — A fresh-context synthesizer (no exposure to the orchestration process) reads every finding and verifies each claim against actual source code. Every finding receives an explicit ACCEPT/REJECT/MODIFY decision with evidence.
This Audit
Code audit (2026-03-26): 5 reviewers across 3 model families. 97 raw findings deduplicated to 34. Synthesizer accepted 30, rejected 2, modified 4.
Deferred items: Accumulated from prior dialectic reviews of implementation plans (AST context overhaul, semantic understanding tiers, message quality), architecture analyses, and post-implementation code reviews. Each followed the same parallel review + synthesis pipeline.
Notable: One reviewer uniquely identified F-028 (the forbid(unsafe) binary crate gap), F-030 (loopback enforcement), and F-032 (non-hermetic tests) — the latter verified by actually executing cargo test on the codebase. This validates that independent multi-model review catches issues no single reviewer would find.
Consolidated audit · 2026-03-27 · v0.6.0 (dev) · ~18K lines · Edition 2024 · MSRV 1.94 · Methodology
Important
This issue reflects the development branch. Not a released version yet.
Note
ID convention: Two-letter prefixes (FR-, SR-, PR-, TR-, PE-, DR-, UX-) reference formal requirements in
PRD.md. Single-letter prefixes (F- for findings, D- for deferred) are temporary IDs local to this audit.Executive Summary
Strengths, concerns & production-readiness verdict
Top 3 Strengths:
unsafecode —#![forbid(unsafe_code)]enforced (though see F-028 for binary crate gap)Top 3 Concerns:
#Production-Readiness Verdict:
Suitable for v0.6.0 release for local/Ollama-centric workflows.
Address F-028, F-030, and F-032 before v1.0 or enterprise distribution.
Audit Dimension Summary
strip_prefixopportunitiesforbid(unsafe)gap, Ollama URL validationdoctormisleading for cloudImmediate Priority
Quick wins that should ship soon. All verified against source code.
overflow-checks = trueto[profile.release]— unanimous across all reviewers, ANSSI-FR aligned.Cargo.toml:131-134l[1..]byte indexing withstrip_prefixincontext.rsandsplitter.rs(grep for&line[1..]and&l[1..])// SAFETY:comments to#[allow(clippy::cast_possible_truncation)]inanalyzer.rsanddiffer.rs(multiple sites — grep forcast_possible_truncation)#![forbid(unsafe_code)]tomain.rs— binary crate re-declares modules viamodinstead of importing fromlib.rs, bypassing the safety attribute. Highest-impact finding (single-reviewer catch, confirmed by synthesizer).main.rs:8Security & Safety
Option<String>, never zeroed from memory, inherited by child processes. Considersecrecy::SecretString.unwrap_or_default(). Normal flow validates key beforehand, but defense-in-depth assertion would help. (synthesizer downgraded to INFO)--allow-secretsbypass needs risk documentation in CLI help text.starts_with("http://")) with no loopback enforcement. User config or env vars can redirect diff traffic to arbitrary remote endpoints. PRD SR-001 specifies loopback-only resolution — not implemented.Async & Concurrency
std::process::Commandblocking tokio runtime inhook_dir()andcopy_to_clipboard(). Usetokio::process::Commandorspawn_blocking.std::io::stdin().read_line()on async thread during secret confirmation. Fail-closed design mitigates risk.warn!log.git showprocess spawning — no semaphore. 50 staged files = 100 processes..ok()silently discarding signal registration errors.Testing & CI
ChangeStatusvariants never used in test fixturesclassify_span_changeNone path untestedmake_symboltest helpers hardcodeline:1, end_line:10— partially addressedfuzz_signatureonly tests Rust parserstablejob.cargo denyin CI. PRD SR-005 specifies it for license compliance. Currently onlycargo audit.clippy.toml— would catch syncCommandusage (F-002) and straydbg!/println!at lint time viadisallowed-methods/disallowed-macroscargo-dist.analyze_repo_with_enough_commitsandanalyze_respects_sample_sizefail under ambient git config (GPG signing, hooks). SetGIT_CONFIG_NOSYSTEM=1,GIT_CONFIG_GLOBAL=/dev/null,HOMEto tempdir. Verified by reviewer actually executingcargo test.Performance & Binary Size
@@headers instead of fulllines().builtin_patterns()recompiles 24 regexes perbuild_patterns()call when custom/disabled patterns are configured. Cache base patterns separately.classify_span_changeallocates two fullStrings for whitespace comparison instead of streamingIterator::eq(). Also applies to newclassify_span_change_rich()andAstDiffer::bodies_semantically_equal().RegexSetfor batch first-pass matching instead of 24 serial regex per line.opt-level = "z"/"s", consider trimming default features, add binary size gate in CI.Semantic Analysis
.scmquery file changes if body-node extraction fails for a languagesym_name(pattern matches comments/strings (false positives)SymbolKey(kind, name, file)can't disambiguate. Low priority (no real-world failures reported)const_declarationnot in query files — JS/TS constants never extractedfunction_definitionandclass_definition— no decorators, no method distinctionis_doc_comment()line-prefix heuristic misclassifies strings containing///,#,"""as comments. AST-based comment detection needed for 100% accuracy.import ()blocks not handled by line-by-line detection?at line end andResult<match in strings/comments. Also:detect_intents()threshold of >2 matches may produce false positives in large diffs with common Rust patterns.test_foo.rs, nested test dirs not matched by stem-based matchingChangeDetailvariants not implemented — Partially addressed (FR-071). Semantic marker detection implemented:UnsafeAdded/Removed,DeriveAdded/Removed,DecoratorAdded/Removed,ExportAdded/Removed,MutabilityChanged,GenericConstraintChanged. Remaining stubs:AttributeAdded/Removed(generic, non-derive),GenericChanged(type parameter changes).BodyUnchangedredundant when other changes exist — omit to save tokensuse foo as baralias not capturedSplitter & Message Quality
use crate::...::file_B_stemfor stronger dependency edgesfeatindicatorCLI & UX
--dry-runoutput — display/streaming issue--dry-runoutputcommitbee doctordoesn't verify cloud provider connectivity — only checks API key presence (is_some()). Ollama gets a real health check but OpenAI/Anthropic don't. Callprovider.verify().await.TERM=dumbor limited terminals. PRD UX-002 specifies this.Observability
#[instrument]spans on key pipeline functions (generate_commit,extract_symbols,build). Would improve debugging and timing visibility.F-024: Test count documentation drift— Fixed in v0.6.0. All docs now say 424.Code Quality & Modernization
CodeSymbolall-pub fields without#[non_exhaustive]— only matters if publishing as libraryPromptContextall-pub fields without#[non_exhaustive]— same caveatreqwestTLS backend not explicitly pinned. Synthesizer notes: reqwest 0.13 uses platform-native TLS by default. Pin torustls-tlsif pure-Rust stack desired, but native TLS is acceptable.history.rs:49— use.round() as u32main.rsto consume library crate instead of re-declaring module treeArchitecture
git showcalls to gix object reading — reduce process overhead on large reposAnalyzerService— trait-based per-language isolationrationalefield for debugging incorrect inferenceCompliance
overflow-checks(F-001), binaryforbid(unsafe)gap (F-028)Dependency Health
All 30 direct dependencies clean. No CVEs. Full audit by
cargo audit(433 crate graph).Methodology
Audit process details
Dialectic Verification Pipeline
The audit uses a multi-stage dialectic verification process designed to prevent anchoring bias:
This Audit
forbid(unsafe)binary crate gap), F-030 (loopback enforcement), and F-032 (non-hermetic tests) — the latter verified by actually executingcargo teston the codebase. This validates that independent multi-model review catches issues no single reviewer would find.