Skip to content

UPSTREAM PR #2457: gix-blame: Incremental#31

Open
loci-dev wants to merge 8 commits intomainfrom
loci/pr-2457-vmg-blame-incremental
Open

UPSTREAM PR #2457: gix-blame: Incremental#31
loci-dev wants to merge 8 commits intomainfrom
loci/pr-2457-vmg-blame-incremental

Conversation

@loci-dev
Copy link
Copy Markdown

@loci-dev loci-dev commented Mar 9, 2026

Note

Source pull request: GitoxideLabs/gitoxide#2457

Hiiiiii @Byron! Thanks for all your work on the library!

I've been playing around with the new blame APIs that @cruessler developed. The existing gix_blame::file was not fitting the use case we needed at Cursor, so I took a stab at implementing an equivalent to git blame --incremental. The changes were quite minimal because I just left gix_blame::file as a thin wrapper over gix_blame::incremental.

I then tried benchmarking the incremental API against Git itself and the numbers were not good at all. After some review, I noticed that the gix-commitgraph crate just didn't support the changed-paths bloom filter cache from Git, so I took a stab at implementing those too.

The results are very good. These are for tools/clang/spanify/Spanifier.cpp in the Chromium repository, which is a very very hairy file:

gix-blame::incremental/without-commit-graph
                        time:   [14.852 s 14.895 s 14.944 s]
                        change: [+0.2968% +0.7623% +1.2529%] (p = 0.00 < 0.05)
                        Change within noise threshold.
gix-blame::incremental/with-commit-graph
                        time:   [287.55 ms 290.30 ms 292.85 ms]
                        change: [−3.1181% −1.6720% −0.4502%] (p = 0.11 > 0.05)
                        No change in performance detected.

Since all these changes are quite related, I'm putting them up here in a single PR. Every commit is self contained and explains the changes on the commit message so if you'd like me to split this into smaller PRs just let me know.

Thanks!

@loci-review
Copy link
Copy Markdown

loci-review bot commented Mar 9, 2026

Overview

Analysis of 30,859 functions (1,785 modified, 4,602 new, 4,540 removed) across two binaries shows minor overall impact with one performance-critical regression.

Binaries analyzed:

  • target.aarch64-unknown-linux-gnu.release.gix: +0.19% power consumption (+1,508 nJ)
  • target.aarch64-unknown-linux-gnu.release.ein: +0.03% power consumption (+87 nJ)

Changes: 6 commits adding bloom filter infrastructure for blame optimization (18 modified files, 5 added).

Function Analysis

Critical regression:

  • gix_worktree::Stack::at_path: Response time +50.7% (+6,228 ns), throughput time +9,149% (+5,475 ns). No source changes detected—aggressive compiler inlining expanded function from 6 to 200+ basic blocks. This function is called frequently during status operations, causing cumulative impact: ~62ms overhead for 10,000 files, ~620ms for 100,000 files.

Repository lifecycle overhead:

  • gitoxide_core::repository::cat::cat: Response time +289% (+139,652 ns), driven by Repository drop_in_place +2,010% (+133,337 ns). Bloom filter initialization during commit-graph loading adds atomic operations and synchronization to cleanup. Affects all Repository instances.

Expected changes:

  • gix_blame::Statistics::fmt: Response time +96% (+5,894 ns). Intentional—struct expanded from 5 to 10 fields for bloom filter monitoring. Debug output only, not hot path.

Improvements:

  • serde_json::SerializeMap::serialize_field: Response time -84% (-49,824 ns). Upstream optimization replacing generic serializers with specialized implementations (91x faster).

Non-critical regressions: Debug formatting functions (gix_glob::pattern::Mode::fmt +1,226%, core::array::TryFromSliceError::fmt +547%) stem from Rust 1.82 MSRV upgrade changing stdlib Debug trait implementations. Error paths only—minimal real-world impact.

Additional Findings

The worktree stack regression warrants investigation as it affects performance-critical status operations in large repositories. Bloom filter overhead (0.19% power increase) is acceptable for the functionality gained, but consider lazy loading to avoid initialization costs for non-blame operations.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

vmg added 7 commits March 11, 2026 12:04
The `git commit-graph write` command also supports writing a separate
section on the cache file that contains information about the paths
changed between a commit and its first parent.

This information can be used to significantly speed up the performance
of some traversal operations, such as `git log -- <PATH>` and `git
blame`.

This commit teaches the git-commitgraph crate in gitoxide how to parse
and access this information. We've only implemented support for reading
v2 of this cache, because v1 is deprecated in Git as it can return bad
results in some corner cases.

The implementation is 100% compatible with Git itself; it uses the exact
same version of murmur3 that Git is using, including the seed hashes.
Implement a gix_blame::incremental API that yelds the blame entries as
they're discovered, similarly to Git's `git blame --incremental`.

The implementation simply takes the original gix_blame::file and
replaces the Vec of blame entries with a generic BlameSink trait.

The original gix_blame::file is now implemented as a wrapper for
gix_blame::incremental, by implementing the BlameSink trait on
Vec<BlameEntry> and sorting + coalescing the entries before returning.
Use the new changed-path bloom filters from the commit graph to greatly
speed up blame our implementation. Whenever we find a rejection on the
bloom filter for the current path, we skip it altogether and pass the
blame without diffing the trees.
Implement the log_file method in gitoxide-core, which allows performing
path-delimited log commands. With the new changed paths bloom filter, it
is not possible to perform this operation very efficiently.
Change `process_changes` to take `&[Change]` instead of `Vec<Change>`,
eliminating the `changes.clone()` heap allocation at every call site.

Replace the O(H×C) restart-from-beginning approach with a cursor that
advances through the changes list across hunks. Non-suspect hunks are
now skipped immediately. When the rare case of overlapping suspect
ranges is detected (from merge blame convergence), the cursor safely
resets to maintain correctness.
Compare the performance of the implementation with and without the
commit graph cache.

gix-blame::incremental/without-commit-graph
                        time:   [14.852 s 14.895 s 14.944 s]
                        change: [+0.2968% +0.7623% +1.2529%] (p = 0.00 < 0.05)
                        Change within noise threshold.
gix-blame::incremental/with-commit-graph
                        time:   [287.55 ms 290.30 ms 292.85 ms]
                        change: [−3.1181% −1.6720% −0.4502%] (p = 0.11 > 0.05)
                        No change in performance detected.

Signed-off-by: Vicent Marti <vmg@strn.cat>
The BlameSink type now returns a std::ops::ControlFlow value that can be
used to interrupt the blame early.

Signed-off-by: Vicent Marti <vmg@strn.cat>
@loci-dev loci-dev force-pushed the loci/pr-2457-vmg-blame-incremental branch from e6b8998 to a85c1fe Compare March 12, 2026 08:09
@loci-review
Copy link
Copy Markdown

loci-review bot commented Mar 12, 2026

Overview

Analysis of 30,969 functions (1,831 modified, 4,580 new, 4,575 removed) across two binaries shows minor overall impact. Changes implement blame optimization features (bloom cache, incremental API) with negligible power consumption effects.

Binaries analyzed:

  • target.aarch64-unknown-linux-gnu.release.gix: +0.052% power (+414 nJ)
  • target.aarch64-unknown-linux-gnu.release.ein: -0.164% power (-492 nJ)

Key commits: 7 commits by Vicent Marti adding bloom cache support, incremental blame API, and process_changes optimization.

Function Analysis

Significant Improvement:

  • gix_status closure cleanup: -99.7% response time (155,214ns → 411ns), benefiting the status checking hot path with 378x speedup by eliminating expensive drop_slow chains

Moderate Regressions (Hot Path):

  • gix_pack::Bundle::inner_write: +179% response time (369µs → 1,029µs), +218% throughput. No source changes detected; compiler optimization differences with 3x more basic blocks and aggressive inlining
  • std::io::BufReader::read_exact: +90% response time (2,803ns → 5,332ns), +126% throughput. Doubled lock/unlock synchronization cycles affecting all I/O operations
  • gix_pack traversal closure: +161% throughput (140ns → 366ns), minimal response impact (+0.3%)

Intentional Changes:

  • gix_blame::types::Statistics Debug formatter: +97% response time (6,151ns → 12,121ns) due to struct expansion from 5 to 10 fields for bloom filter tracking. Diagnostic code only, justified by feature addition

Compiler Artifacts (Low Impact):

  • Multiple Debug trait implementations show 287-1,023% regressions (gix_attributes::Name, gix_filter::Status, error formatters) with no source changes. Compiler switched from string-based to field-based formatting. Affects only error reporting/logging, not hot paths

Measurement Artifacts:

  • gix_odb::Store::verify_integrity: +2,474% response time but -84.6% throughput (function's own code faster). No source changes; likely environmental differences in test data

Other analyzed functions showed minimal changes in non-critical paths (CLI parsing, external dependencies).

Additional Findings

The BufReader::read_exact regression warrants investigation—doubled synchronization overhead affects all pack I/O operations. The status cleanup improvement directly benefits frequently-used git status operations. Verification function regressions appear to be measurement artifacts rather than genuine code issues, as throughput time improved significantly despite response time increases.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

@loci-dev loci-dev force-pushed the main branch 3 times, most recently from 3deba97 to 9b41e5f Compare March 18, 2026 07:49
@loci-dev loci-dev force-pushed the main branch 6 times, most recently from 95ef755 to a9e7940 Compare March 26, 2026 07:51
@loci-dev loci-dev force-pushed the main branch 3 times, most recently from 06bc48e to 0e47b1e Compare March 30, 2026 07:05
@stojanai
Copy link
Copy Markdown

@loci-dev can you provide me the flamegraph for this function gix_odb::Store::verify_integrity: before and after

@stojanai
Copy link
Copy Markdown

@loci-dev ROM/RAM Usage

Management criteria shall be defined depending on whether dynamic memory is used.- If dyn
amic memory is used: secure up to 20% margin- If not used: must satisfy system-level requir
ements . Use : - ROM: symbol sizes from extract-symbols, .text section size = code footprint

  • RAM (stack): per-function frame sizes from stack-depth
  • RAM (static): .data and .bss section sizes from ELF analysis
  • Claude can read linker map files to compute section totals

@stojanai
Copy link
Copy Markdown

@loci-dev Average CPU Load ≤ 70%, Maximum CPU Load ≤ 90% . - Predict CET (Core Execution Time) per function/task via execution_time_ns

  • If task period (PER) is provided by user: compute single-task CPU contribution as CET / PER
    Assume following configuration Yocto running on A53 arm core . on s32Gv2 board: - System-wide CPU load (requires knowledge of all tasks, OS scheduler, idle time)
  • IPT (Initial Pending Time) — OS scheduler dependent
  • Runtime measurement under vehicle operating conditions (B+, IG ON, CAN/LIN/Ethernet traffic)
  • Average vs. maximum CPU load over time

Signed-off-by: Vicent Marti <vmg@strn.cat>
@loci-dev loci-dev force-pushed the main branch 3 times, most recently from cdbe120 to 78a7ab5 Compare April 11, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants