Skip to content

Commit e8d75ff

Browse files
committed
docs: update PRD to v4.4 with FR-073, FR-074, FR-075, TR-008, PE-007
Added future requirements for function move detection via AST fingerprinting, dependency-based commit splitting, configurable file categorization, LLM output quality testing, and token-accurate budget management. Reflects audit findings and planned improvements.
1 parent a6c22fb commit e8d75ff

1 file changed

Lines changed: 25 additions & 4 deletions

File tree

PRD.md

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,20 @@ SPDX-License-Identifier: AGPL-3.0-only OR LicenseRef-Commercial
66

77
# CommitBee — Product Requirements Document
88

9-
**Version**: 4.3
10-
**Date**: 2026-03-27
9+
**Version**: 4.4
10+
**Date**: 2026-03-28
1111
**Status**: Active
1212
**Author**: [Sephyi](https://github.com/Sephyi) + [Claude Opus 4.6](https://www.anthropic.com/news/claude-opus-4-6)
1313

1414
## Changelog
1515

1616
<details>
17-
<summary>Revision history (v3.3 → v4.3)</summary>
17+
<summary>Revision history (v3.3 → v4.4)</summary>
1818

1919
| Version | Date | Summary |
2020
|---------|------------|---------|
21-
| 4.3 | 2026-03-27 | v0.6.0-rc.1 deep semantic understanding: parent scope, import detection, doc-vs-code classification, structural AST diffs (AstDiffer + SymbolDiff), STRUCTURED CHANGES prompt section, token budget rebalance, T3 semantic markers (FR-071), change intent detection (FR-072). 424 tests. |
21+
| 4.4 | 2026-03-27 | Added future requirements from audit: FR-073 (move detection), FR-074 (AST-based splitting), FR-075 (configurable categorization), TR-008 (LLM quality testing), PE-007 (token-accurate budgets). |
22+
| 4.3 | 2026-03-27 | v0.6.0-rc.1 deep semantic understanding: parent scope, import detection, doc-vs-code, structural AST diffs, semantic markers (FR-071), change intent (FR-072). 424 tests. |
2223
| 4.2 | 2026-03-22 | v0.5.0 hardening: security fixes (SSRF prevention, streaming caps), prompt optimization (budget fix, evidence omission, emoji removal), eval harness (36 fixtures, per-type reporting), test coverage (15+ new tests), API hygiene (pub(crate) demotions), 5 fuzz targets. 359 tests. |
2324
| 4.1 | 2026-03-22 | AST context overhaul (v0.5.0): full signature extraction from tree-sitter nodes, semantic change classification (whitespace vs body vs signature), old→new signature diffs, cross-file connection detection, formatting auto-detection via symbols. 359 tests. |
2425
| 4.0 | 2026-03-13 | PRD normalization: aligned phases with shipped versions (v0.2.0/v0.3.x/v0.4.0), collapsed revision history, unified status markers, resolved stale critical issues, canonicalized test count to 308, removed dead cross-references. FR-031 (Exclude Files) and FR-033 (Copy to Clipboard) shipped. |
@@ -549,6 +550,18 @@ Automatic semantic version bumps based on commit types. Natural extension of con
549550

550551
Run commitbee in CI to validate or rewrite commit messages. Key differentiator for team adoption.
551552

553+
#### FR-073: Function Move Detection
554+
555+
Detect when a function is moved between files or within a file with zero semantic changes, using AST structural fingerprinting (hash tree topology ignoring identifiers). Classify as `refactor` rather than add+delete. Significantly improves commit type accuracy for common refactoring patterns.
556+
557+
#### FR-074: AST-Based Dependency Analysis for Splitting
558+
559+
Replace hardcoded path heuristics (`GENERIC_DIRS`, `KNOWN_PAIRS`) in the commit splitter with actual code dependency analysis derived from AST imports and call patterns. Produces higher-quality split groups based on real code relationships rather than file proximity.
560+
561+
#### FR-075: Configurable File Categorization
562+
563+
Allow users to define custom file category patterns in config (e.g., `[categorization] build_patterns = ["Tiltfile", "*.bazel"]`, `source_extensions = ["rs", "ts", "custom_lang"]`). Currently all patterns are hardcoded in `FileCategory::from_path()`. Enables support for proprietary build systems and custom file types.
564+
552565
## 5. Security Requirements
553566

554567
### SR-001: Secret Scanning
@@ -777,6 +790,10 @@ proptest! {
777790

778791
5 `cargo-fuzz` targets. See §4.3.
779792

793+
### TR-008: LLM Output Quality Testing
794+
795+
End-to-end commit message quality validation. Two modes: (1) wiremock-based deterministic testing with canned LLM responses through the full pipeline (sanitizer + validator), (2) optional live Ollama regression testing with majority-vote scoring and baseline comparison. Extends the eval harness (TR-006) from pre-LLM pipeline testing to actual output quality assurance.
796+
780797
## 9. Distribution Requirements
781798

782799
### DR-001: cargo install
@@ -851,6 +868,10 @@ Binary files never included as diff content. Listed in file list with change sta
851868

852869
Invalid JSON → retry once with repair prompt. Second failure → heuristic extraction (type from file categories, first coherent sentence as description). Never retry more than once.
853870

871+
### PE-007: Token-Accurate Budget Management
872+
873+
Replace character-based budget estimation (~4:1 char-to-token ratio approximation) with actual BPE/tiktoken token counting for accurate LLM context window utilization. Maximizes prompt quality by filling available tokens precisely rather than under/over-estimating. Consider lightweight Rust BPE implementation or pre-computed token tables per model family.
874+
854875
## 11. Roadmap Summary
855876

856877
| Phase | Version | Status | Focus |

0 commit comments

Comments
 (0)