You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: update docs for v0.6.0-rc-1 with AstDiffer and 410 tests
Documents shipped v0.6.0 features: parent scope extraction, structural
AST diffs via AstDiffer, import change detection, doc-vs-code
SpanChangeKind classification, and adaptive token budgeting. Updates
test count from 367 to 410 across README, DOCS, CHANGELOG, and PRD.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+11-2Lines changed: 11 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ SPDX-License-Identifier: AGPL-3.0-only OR LicenseRef-Commercial
8
8
9
9
All notable changes to CommitBee are documented here.
10
10
11
-
## `v0.6.0` — Deep Understanding (current, in progress)
11
+
## `v0.6.0-rc.1` — Deep Understanding (release candidate)
12
12
13
13
### Semantic Analysis
14
14
@@ -18,11 +18,20 @@ All notable changes to CommitBee are documented here.
18
18
-**Test file correlation** — New `RELATED FILES:` prompt section shows when source files and their matching test files are both staged. Stem-based matching, capped at 5 entries.
19
19
-**Structural AST diffs** — `AstDiffer` compares old and new tree-sitter nodes for modified symbols, producing structured `SymbolDiff` descriptions (parameter added, return type changed, visibility changed, async toggled, body modified). Shown as `STRUCTURED CHANGES:` section in the prompt.
20
20
-**Whitespace-aware body comparison** — Body diff uses character-stream stripping so reformatting doesn't produce false `BodyModified` results.
21
+
-**Structured changes in prompt** — New `STRUCTURED CHANGES:` section in the LLM prompt shows concise one-line descriptions of what changed per symbol (e.g., `CommitValidator::validate(): +param strict: bool, return bool → Result<()>, body modified`). Omitted when no structural diffs exist.
21
22
22
23
### Type Inference
23
24
24
25
-**Test-to-code ratio** — When >80% of additions are in test files, suggests `test` type even with source files present. Uses cross-multiplication to avoid integer truncation.
25
26
27
+
### Prompt Quality
28
+
29
+
-**Token budget rebalance** — Symbol budget reduced from 30% to 20% when structural diffs are available, freeing space for the raw diff. SYSTEM_PROMPT updated to guide the LLM to prefer STRUCTURED CHANGES for signature details.
30
+
31
+
### Testing
32
+
33
+
-**410 tests** total (up from 367 at v0.5.0).
34
+
26
35
## `v0.5.0` — Beyond the Diff
27
36
28
37
### Semantic Analysis
@@ -56,7 +65,7 @@ All notable changes to CommitBee are documented here.
56
65
-**Evaluation harness** — 36 fixtures covering all 11 commit types, AST features, and edge cases. Per-type accuracy reporting with `EvalSummary`.
57
66
-**15+ new unit tests** — Coverage for `detect_primary_change`, `detect_metadata_breaking`, `detect_bug_evidence` (all 7 patterns), Deleted/Renamed status, signature edge cases, connection content assertions.
Copy file name to clipboardExpand all lines: DOCS.md
+15-7Lines changed: 15 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,11 +86,11 @@ Here's what each step actually does:
86
86
87
87
**1. Git Service** reads your staged changes using `gix` for repo discovery and the git CLI for diffs. Paths are parsed with NUL-delimited output (`-z` flag) so filenames with spaces or special characters work correctly.
88
88
89
-
**2. Tree-sitter Analyzer** parses both the staged version and the HEAD version of every changed file — in parallel, using `rayon` across CPU cores. It extracts **full signatures** (e.g., `pub fn connect(host: &str, timeout: Duration) -> Result<Connection>`) by taking the definition node text before the body child. Modified symbols show old → new signature diffs. Cross-file connections are detected (caller+callee both changed). Symbols are tracked in three states: added, removed, or modified-signature.
89
+
**2. Tree-sitter Analyzer** parses both the staged version and the HEAD version of every changed file — in parallel, using `rayon` across CPU cores. It extracts **full signatures** (e.g., `pub fn connect(host: &str, timeout: Duration) -> Result<Connection>`) by taking the definition node text before the body child. Methods include their **parent scope** (enclosing impl, class, or trait — e.g., `CommitValidator::validate`). Modified symbols show old → new signature diffs, with **structural AST diffs** that describe exactly what changed (parameters added/removed, return type changed, visibility changed, etc.). Cross-file connections are detected (caller+callee both changed). Symbols are tracked in three states: added, removed, or modified-signature, with a **doc-vs-code distinction** indicating whether changes were documentation-only, code-only, or mixed.
90
90
91
91
**3. Commit Splitter** looks at your staged changes and decides whether they contain logically independent work. It uses diff-shape fingerprinting (what kind of changes — additions, deletions, modifications) combined with Jaccard similarity on content vocabulary to group files. If it finds multiple concerns, it offers to split them into separate commits.
92
92
93
-
**4. Context Builder** assembles a budget-aware prompt. It classifies modified symbols as whitespace-only or semantic (via character-stream comparison), computes evidence flags (mechanical change? public APIs removed? bug-fix evidence?), detects cross-file connections, calculates the character budget for the subject line, and packs context within the token limit (~6K tokens, 30/70 symbol/diff split when signatures present).
93
+
**4. Context Builder** assembles a budget-aware prompt. It classifies modified symbols as whitespace-only or semantic (via character-stream comparison), computes evidence flags (mechanical change? public APIs removed? bug-fix evidence?), detects cross-file connections, identifies import changes and test file correlations, calculates the character budget for the subject line, and packs context within the token limit (~6K tokens). The token budget adapts: when structural AST diffs are available, symbols get 20% of the budget (diffs carry more detail); when only signatures are available, symbols get 30%.
94
94
95
95
**5. LLM Provider** streams the prompt to your chosen model (Ollama, OpenAI, or Anthropic) and collects the response token by token.
96
96
@@ -107,6 +107,10 @@ CommitBee doesn't just send a diff. The prompt includes:
107
107
-**Evidence flags** telling the LLM deterministic facts about the change
108
108
-**Symbol changes with full signatures** — `[+] pub fn connect(host: &str) -> Result<()>`, not just "Function connect"
109
109
-**Signature diffs** — `[~] old_sig → new_sig` for modified symbols
-**Doc-vs-code annotations** — modified symbols tagged `[docs only]` or `[docs + code]` when change is documentation-only or mixed
110
114
-**Cross-file connections** — `validator calls parse() — both changed`
111
115
-**Primary change detection** — which file has the most significant changes
112
116
-**Constraints** — rules the LLM must follow based on evidence (e.g., "no bug-fix comments found, prefer refactor over fix")
@@ -570,7 +574,9 @@ For supported languages, symbols are tracked in three states:
570
574
-**Removed**`[-]` — Deleted symbol
571
575
-**Modified (signature changed)**`[~]` — Symbol exists in both versions but its signature changed
572
576
573
-
This information appears in the prompt as a `SYMBOLS CHANGED` section, giving the LLM precise knowledge of what was structurally modified.
577
+
Modified symbols include additional annotations: `[docs only]` when only documentation/comments changed, `[docs + code]` when both documentation and code changed. Methods show their parent scope (e.g., `CommitValidator::validate` rather than just `validate`).
578
+
579
+
This information appears in the prompt as a `SYMBOLS CHANGED` section. When structural AST diffs are available, a separate `STRUCTURED CHANGES` section provides precise details like `+param timeout`, `return Result<()> → Result<Error>`, or `+field name`.
**Streaming with Cancellation** — All providers support Ctrl+C cancellation via `tokio_util::CancellationToken`. The streaming display runs in a separate tokio task with `tokio::select!` for responsive cancellation.
680
688
681
-
**Token Budget** — The context builder tracks character usage (~4 chars per token) and truncates the diff if it exceeds the budget, prioritizing the most important files. The default 24K char budget (~6K tokens) is safe for 8K context models.
689
+
**Token Budget** — The context builder tracks character usage (~4 chars per token) and truncates the diff if it exceeds the budget, prioritizing the most important files. The budget adapts based on available information: when structural AST diffs are present, the symbol allocation shrinks (20%) since the diffs carry precise detail; when only signatures are available, symbols get 30%. The default 24K char budget (~6K tokens) is safe for 8K context models.
682
690
683
691
**Single Source of Truth for Types** — `CommitType::ALL` is a const array that defines all valid commit types. The system prompt's type list is verified at compile time (via a `#[test]`) to match this array exactly.
684
692
@@ -694,7 +702,7 @@ No panics in user-facing code paths. The sanitizer and validator are tested with
694
702
695
703
### Testing Strategy
696
704
697
-
CommitBee has 367 tests across multiple strategies:
705
+
CommitBee has 410 tests across multiple strategies:
698
706
699
707
| Strategy | What It Covers |
700
708
| --- | --- |
@@ -707,7 +715,7 @@ CommitBee has 367 tests across multiple strategies:
707
715
Run them:
708
716
709
717
```bash
710
-
cargo test# All 367 tests
718
+
cargo test# All 410 tests
711
719
cargo test --test sanitizer # Just sanitizer tests
712
720
cargo test --test integration # LLM provider mocks
713
721
COMMITBEE_LOG=debug cargo test -- --nocapture # With logging
@@ -479,7 +481,37 @@ Project-level `.commitbee.toml` can no longer override `openai_base_url`, `anthr
479
481
480
482
Subject character budget accounts for `!` suffix on breaking changes. EVIDENCE section omitted when all flags are default (~200 chars saved). Symbol marker legend added to SYSTEM_PROMPT (`[+] added, [-] removed, [~] modified`). Duplicate JSON schema removed from system prompt. Emoji replaced with text labels (`WARNING:` instead of `⚠`). CONNECTIONS instruction softened for small models. Python tree-sitter queries enhanced with `decorated_definition` support.
In `infer_commit_type`, when >80% of additions are in `FileCategory::Test` files, returns `CommitType::Test` even with source files present. Uses cross-multiplication (`test * 100 > total * 80`) to avoid integer truncation. 2 tests.
501
+
502
+
#### FR-068: Test File Correlation ✅
503
+
504
+
`detect_test_correlation()` matches staged source files to test files by file stem, producing a `RELATED FILES:` prompt section (e.g., `src/services/context.rs <-> tests/context.rs (test file)`). Capped at 5 entries. 4 tests.
505
+
506
+
#### FR-069: Structural AST Diffs ✅
507
+
508
+
`AstDiffer` in `src/services/differ.rs` compares old and new tree-sitter AST nodes for modified symbols, producing `SymbolDiff` with `Vec<ChangeDetail>` (15-variant enum: `ParamAdded`, `ParamRemoved`, `ParamTypeChanged`, `ReturnTypeChanged`, `VisibilityChanged`, `AttributeAdded`/`Removed`, `AsyncChanged`, `GenericChanged`, `BodyModified`, `BodyUnchanged`, `FieldAdded`/`Removed`/`TypeChanged`). Runs inside `extract_for_file()` while both Trees are alive (Node lifetime constraint). `extract_symbols()` returns `(Vec<CodeSymbol>, Vec<SymbolDiff>)`. Struct/enum field diffing stubbed for future. Whitespace-aware body comparison via character-stream stripping. 7 unit tests + 6 per-language integration tests.
509
+
510
+
#### FR-070: Structured Changes Prompt Section ✅
511
+
512
+
`STRUCTURED CHANGES:` section in LLM prompt renders `SymbolDiff::format_oneline()` descriptions (e.g., `CommitValidator::validate(): +param strict: bool, return bool → Result<()>, body modified (+5 -2)`). Omitted when no structural diffs exist. Token budget rebalanced: symbol budget reduced from 30% to 20% when structural diffs available, freeing space for raw diff. SYSTEM_PROMPT updated to guide LLM to prefer structured changes for signature details. 3 tests.
0 commit comments