gix-blame: Incremental by vmg · Pull Request #2457 · GitoxideLabs/gitoxide

Vicent Martí (vmg) · 2026-03-06T17:06:26Z

Hiiiiii Sebastian Thiel (@Byron)! Thanks for all your work on the library!

I've been playing around with the new blame APIs that Christoph Rüßler (@cruessler) developed. The existing gix_blame::file was not fitting the use case we needed at Cursor, so I took a stab at implementing an equivalent to git blame --incremental. The changes were quite minimal because I just left gix_blame::file as a thin wrapper over gix_blame::incremental.

I then tried benchmarking the incremental API against Git itself and the numbers were not good at all. After some review, I noticed that the gix-commitgraph crate just didn't support the changed-paths bloom filter cache from Git, so I took a stab at implementing those too.

The results are very good. These are for tools/clang/spanify/Spanifier.cpp in the Chromium repository, which is a very very hairy file:

gix-blame::incremental/without-commit-graph
                        time:   [14.852 s 14.895 s 14.944 s]
                        change: [+0.2968% +0.7623% +1.2529%] (p = 0.00 < 0.05)
                        Change within noise threshold.
gix-blame::incremental/with-commit-graph
                        time:   [287.55 ms 290.30 ms 292.85 ms]
                        change: [−3.1181% −1.6720% −0.4502%] (p = 0.11 > 0.05)
                        No change in performance detected.

Since all these changes are quite related, I'm putting them up here in a single PR. Every commit is self contained and explains the changes on the commit message so if you'd like me to split this into smaller PRs just let me know.

Thanks!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21e316d86c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

gix-commitgraph/src/bloom.rs

Christoph Rüßler (cruessler)

Thanks a lot for the effort, that’s much appreciated as making blame incremental was on our TODO list as well! I had a first look at the changes and left 2 comments (though this is far from an in-depth review).

gix-blame/src/types.rs

gix-blame/tests/blame.rs

The `git commit-graph write` command also supports writing a separate section on the cache file that contains information about the paths changed between a commit and its first parent. This information can be used to significantly speed up the performance of some traversal operations, such as `git log -- <PATH>` and `git blame`. This commit teaches the git-commitgraph crate in gitoxide how to parse and access this information. We've only implemented support for reading v2 of this cache, because v1 is deprecated in Git as it can return bad results in some corner cases. The implementation is 100% compatible with Git itself; it uses the exact same version of murmur3 that Git is using, including the seed hashes.

Implement a gix_blame::incremental API that yelds the blame entries as they're discovered, similarly to Git's `git blame --incremental`. The implementation simply takes the original gix_blame::file and replaces the Vec of blame entries with a generic BlameSink trait. The original gix_blame::file is now implemented as a wrapper for gix_blame::incremental, by implementing the BlameSink trait on Vec<BlameEntry> and sorting + coalescing the entries before returning.

Use the new changed-path bloom filters from the commit graph to greatly speed up blame our implementation. Whenever we find a rejection on the bloom filter for the current path, we skip it altogether and pass the blame without diffing the trees.

Implement the log_file method in gitoxide-core, which allows performing path-delimited log commands. With the new changed paths bloom filter, it is not possible to perform this operation very efficiently.

Change `process_changes` to take `&[Change]` instead of `Vec<Change>`, eliminating the `changes.clone()` heap allocation at every call site. Replace the O(H×C) restart-from-beginning approach with a cursor that advances through the changes list across hunks. Non-suspect hunks are now skipped immediately. When the rare case of overlapping suspect ranges is detected (from merge blame convergence), the cursor safely resets to maintain correctness.

Compare the performance of the implementation with and without the commit graph cache. gix-blame::incremental/without-commit-graph time: [14.852 s 14.895 s 14.944 s] change: [+0.2968% +0.7623% +1.2529%] (p = 0.00 < 0.05) Change within noise threshold. gix-blame::incremental/with-commit-graph time: [287.55 ms 290.30 ms 292.85 ms] change: [−3.1181% −1.6720% −0.4502%] (p = 0.11 > 0.05) No change in performance detected. Signed-off-by: Vicent Marti <vmg@strn.cat>

The BlameSink type now returns a std::ops::ControlFlow value that can be used to interrupt the blame early. Signed-off-by: Vicent Marti <vmg@strn.cat>

Vicent Martí (vmg) · 2026-03-11T11:06:02Z

Thanks for the review Christoph Rüßler (@cruessler)! I did the two changes; interrupting the incremental blame is a little bit involved and makes the code harder to follow so I've kept it in a separate commit. Let me know what else would you like to see changed.

Vicent Martí (vmg) · 2026-03-25T11:15:42Z

Anything else I could do to move this PR forward, Sebastian Thiel (@Byron)? I think it'd a nice addition upstream and we would very much prefer not to carry a Gitoxide fork. Thank you so much in advance!

Sebastian Thiel (Byron) · 2026-03-25T12:17:55Z

Hi, I'd also like to get this merged, and it seems there is more and more code coming up and never enough time to review.

These days that's all I am doing 😅.

Codex (@codex) review

Copilot

Pull request overview

This PR adds an incremental blame API to gix-blame (akin to git blame --incremental) and introduces commit-graph changed-path Bloom filter support in gix-commitgraph to dramatically speed up blame and path-focused history traversal.

Changes:

Add gix_blame::incremental() with a BlameSink streaming interface, and reimplement gix_blame::file() as a collecting wrapper.
Add commit-graph changed-path Bloom filter parsing and query APIs (plus compatibility handling for split commit-graphs).
Extend tests/fixtures and add a Criterion benchmark for incremental blame.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
gix-commitgraph/tests/fixtures/split_chain_top_without_bloom.sh	Fixture for split graphs where the top layer lacks Bloom chunks
gix-commitgraph/tests/fixtures/split_chain_changed_paths_mismatch.sh	Fixture for split graphs with mismatched Bloom settings across layers
gix-commitgraph/tests/fixtures/changed_paths_v2.sh	Fixture that writes changed-path Bloom filters v2
gix-commitgraph/tests/access/mod.rs	Tests for Bloom settings reading + disabling when chunks/offsets are invalid
gix-commitgraph/src/lib.rs	Adds Bloom settings type, bloom module export, and shared `from_be_u32()`
gix-commitgraph/src/init.rs	Disables incompatible Bloom layers when composing multi-file graphs
gix-commitgraph/src/file/mod.rs	Adds chunk IDs/constants for Bloom data/index
gix-commitgraph/src/file/init.rs	Parses BDAT/BIDX chunks and enables Bloom querying when valid
gix-commitgraph/src/file/commit.rs	Exposes commit-level Bloom query helpers (`maybe_contains_*`)
gix-commitgraph/src/file/access.rs	Adds accessors and a method to clear Bloom filter state
gix-commitgraph/src/bloom.rs	Implements Bloom hashing and query logic + graph-level helpers
gix-commitgraph/src/access.rs	Exposes top-compatible Bloom settings at the `Graph` level
gix-commitgraph/Cargo.toml	Adds `murmur3` dependency for Bloom hashing
gix-blame/tests/blame.rs	Tests that Bloom prefilter keeps blame output identical + incremental sink behavior
gix-blame/src/types.rs	Adds `BlameSink`, `IncrementalOutcome`, and Bloom-related statistics counters
gix-blame/src/lib.rs	Exports new incremental API and types
gix-blame/src/file/tests.rs	Updates tests for `process_changes()` signature change
gix-blame/src/file/mod.rs	Moves `coalesce_blame_entries()` here and optimizes `process_changes()` iteration
gix-blame/src/file/function.rs	Implements `incremental()`, adds Bloom prefilter usage, and keeps `file()` as wrapper
gix-blame/benches/incremental_blame.rs	Adds Criterion benchmark for incremental blame with/without commit-graph
gix-blame/Cargo.toml	Adds benchmark target + dev-deps for Criterion and `gix`
gitoxide-core/src/repository/log.rs	Implements path-filtered log using commit-graph Bloom prefilter
Cargo.lock	Locks new dependencies (`murmur3`, Criterion, dev-only `gix`)

gix-commitgraph/src/lib.rs

gix-commitgraph/src/file/init.rs

gix-commitgraph/src/bloom.rs

gix-blame/src/types.rs

gix-blame/src/file/function.rs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a85c1fe07e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

gix-commitgraph/src/bloom.rs

Christoph Rüßler (cruessler) · 2026-03-25T12:43:34Z

Vicent Martí (@vmg) I’m really sorry I haven’t responded earlier, I had a few very busy weeks at work, so was lacking review capacity.

Sebastian Thiel (@Byron) Things have gotten quieter this week, so I can prioritize reviewing the blame-related parts this weekend if you want an extra pair of eyes.

Vicent Martí (vmg) · 2026-03-25T13:40:37Z

No worries! I know you're all busy! I'm gonna go through all the AI comments first (I can see a couple legit ones) and then I'll hand it over so you guys can do human review. Thanks again.

Sebastian Thiel (Byron) · 2026-03-26T04:57:39Z

Thanks everyone! Yes, I will be happy to wait for your review Christoph Rüßler (@cruessler), and you can hand it over to me even with things that are still to be addressed, as long as you want to go forward with the changes. Then I will take it towards merging, and will also be looking forward to seeing more contributions and improvements from Vicent Martí (@vmg) 🙏.

Christoph Rüßler (cruessler) · 2026-03-29T14:40:40Z

No worries! I know you're all busy! I'm gonna go through all the AI comments first (I can see a couple legit ones) and then I'll hand it over so you guys can do human review. Thanks again.

I’ve gone through the changes related to incrementally emitting hunks at a high level, and they look very promising to me! Looking forward to you handing over after you’ve gone through AI’s comments!

One thing I was wondering: in gitoxide tests, we usually try to run our implementations against a git baseline if there is one. In this instance, do you think it would make sense to test the incremental part of the blame algorithm against git’s implementation? We know that gix-blame doesn’t match git blame in 100 % of cases due to small differences in their respective diff algorithms, but there’s still such a significant overlap that writing such tests could be useful.

Vicent Martí (vmg) · 2026-04-03T09:40:38Z

Just went through the AI review comments, they were all basically the same (good) point: since we're not supporting the deprecated v1 format, we should explicitly ensure that we don't load the bloom filter cache when it's v1.

Let me know what else needs changing!

Signed-off-by: Vicent Marti <vmg@strn.cat>

Sebastian Thiel (Byron) · 2026-04-06T00:17:52Z

Vicent Martí (@vmg) Maybe we can work towards merging the bloom-filter support first?

From there, gix_blame::incremental appears to be a re-implementation of the original non-incremental algorithm that lives side-by-side, and Christoph Rüßler (@cruessler) wonders how that can be validated more with baseline tests. From all I know, we already have baseline tests and I'd think they can be adapted to run against the incremental implementation as well.
And I agree, I think having the same test-coverage as the existing non-incremental implementation would be a requirement for a merge (and I think it's in the interest of everyone).

Christoph Rüßler (cruessler) · 2026-04-11T07:36:09Z

Thanks a lot for all the changes, so far the code has been really easy to follow and understand which is much appreciated! I started reviewing the gix log PATHSPEC/log_file part and will go through this PR’s major topics one by one (where I think my review and testing might be helpful). In terms of getting it merged as efficiently as possible, I think it would benefit from being split into smaller PRs as it looks like it will take me a couple of hours to do all parts I want to review the justice they deserve. And I might need to spread the reviews a little to keep my eyes fresh. :-)

I just tested the output of running gix log README.md and gix log Cargo.lock in this branch, running a diff with the output of git log --oneline --full-history README.md and Cargo.lock, respectively. Apart from the fact that git log returned hashes that were one character longer than the ones returned by gix (I don’t think that needs to be changed), there were some minor differences in commit titles (I didn’t dig further) and some larger ones with respect to the order of commits in two blocks that had about 2 dozen commits each. At this point, I assume these differences are related to gix::traverse::commit::topo::Builder and not to the newly added code, though.

Based on these observations, I think log_file is already good to go (I know it depends on the commit graph-related changes)!

Christoph Rüßler (cruessler)

This is a review for the commit optimize process_changes to avoid redundant work. Just a few suggestions.

On a separate note: both Sebastian Thiel (@Byron) and I already mentioned so-called baseline tests. In the context of this PR, this would mean creating .baseline files similar to how it is done, e. g., in this script:

gitoxide/gix-blame/tests/fixtures/make_blame_repo.sh

Lines 262 to 297 in 5b94604

    
           git blame --porcelain simple.txt > .git/simple.baseline 
        
           git blame --porcelain -L 1,2 simple.txt > .git/simple-lines-1-2.baseline 
        
           git blame --porcelain -L 1,2 -L 4 simple.txt > .git/simple-lines-multiple-1-2-and-4.baseline 
        
           git blame --porcelain --since 2025-01-31 simple.txt > .git/simple-since.baseline 
        
           git blame --porcelain multiline-hunks.txt > .git/multiline-hunks.baseline 
        
           git blame --porcelain deleted-lines.txt > .git/deleted-lines.baseline 
        
           git blame --porcelain deleted-lines-multiple-hunks.txt > .git/deleted-lines-multiple-hunks.baseline 
        
           git blame --porcelain changed-lines.txt > .git/changed-lines.baseline 
        
           git blame --porcelain changed-line-between-unchanged-lines.txt > .git/changed-line-between-unchanged-lines.baseline 
        
           git blame --porcelain added-lines.txt > .git/added-lines.baseline 
        
           git blame --porcelain added-lines-around.txt > .git/added-lines-around.baseline 
        
           git blame --porcelain switched-lines.txt > .git/switched-lines.baseline 
        
           git blame --porcelain added-line-before-changed-line.txt > .git/added-line-before-changed-line.baseline 
        
           git blame --porcelain same-line-changed-twice.txt > .git/same-line-changed-twice.baseline 
        
           git blame --porcelain coalesce-adjacent-hunks.txt > .git/coalesce-adjacent-hunks.baseline 
        
           mkdir .git/sub-directory 
        
           git blame --porcelain sub-directory/sub-directory.txt > .git/sub-directory/sub-directory.baseline 
        
           git blame --porcelain after-rename.txt > .git/after-rename.baseline 
        
           git blame --porcelain after-second-rename.txt > .git/after-second-rename.baseline 
        
           git blame --porcelain after-rewrite.txt > .git/after-rewrite.baseline 
        
           git blame --porcelain sub-directory/after-move-to-sub-directory.txt > .git/sub-directory/after-move-to-sub-directory.baseline 
        
           git blame --porcelain resolved-conflict.txt > .git/resolved-conflict.baseline 
        
           git blame --porcelain file-in-one-chain-of-ancestors.txt > .git/file-in-one-chain-of-ancestors.baseline 
        
           git blame --porcelain different-file-in-another-chain-of-ancestors.txt > .git/different-file-in-another-chain-of-ancestors.baseline 
        
           git blame --porcelain file-only-changed-in-branch.txt > .git/file-only-changed-in-branch.baseline 
        
           git blame --porcelain file-changed-in-two-branches.txt > .git/file-changed-in-two-branches.baseline 
        
           git blame --porcelain file-topo-order-different-than-date-order.txt > .git/file-topo-order-different-than-date-order.baseline 
        
           git blame --porcelain empty-lines-histogram.txt > .git/empty-lines-histogram.baseline 
        
           git config --local diff.algorithm myers 
        
           git blame --porcelain empty-lines-myers.txt > .git/empty-lines-myers.baseline

. Then, there would be a parser similar to this one:

gitoxide/gix-blame/tests/blame.rs

Lines 11 to 146 in 5b94604

    
           struct Baseline<'a> { 
        
               lines: bstr::Lines<'a>, 
        
               filenames: BTreeMap<ObjectId, bstr::BString>, 
        
           } 
        
           mod baseline { 
        
               use std::{collections::BTreeMap, path::Path}; 
        
               use gix_blame::BlameEntry; 
        
               use gix_hash::ObjectId; 
        
               use gix_ref::bstr::ByteSlice; 
        
               use super::Baseline; 
        
               // These fields are used by `git` in its porcelain output. 
        
               const HEADER_FIELDS: [&str; 12] = [ 
        
                   // https://github.com/git/git/blob/6258f68c3c1092c901337895c864073dcdea9213/builtin/blame.c#L256-L280 
        
                   "author", 
        
                   "author-mail", 
        
                   "author-time", 
        
                   "author-tz", 
        
                   "committer", 
        
                   "committer-mail", 
        
                   "committer-time", 
        
                   "committer-tz", 
        
                   "summary", 
        
                   "boundary", 
        
                   // https://github.com/git/git/blob/6258f68c3c1092c901337895c864073dcdea9213/builtin/blame.c#L239-L248 
        
                   "previous", 
        
                   "filename", 
        
               ]; 
        
               fn is_known_header_field(field: &&str) -> bool { 
        
                   HEADER_FIELDS.contains(field) 
        
               } 
        
               impl Baseline<'_> { 
        
                   pub fn collect( 
        
                       baseline_path: impl AsRef<Path>, 
        
                       source_file_name: gix_object::bstr::BString, 
        
                   ) -> std::io::Result<Vec<BlameEntry>> { 
        
                       let content = std::fs::read(baseline_path)?; 
        
                       let baseline = Baseline { 
        
                           lines: content.lines(), 
        
                           filenames: BTreeMap::default(), 
        
                       }; 
        
                       Ok(baseline 
        
                           .map(|entry| { 
        
                               let source_file_name = if entry.source_file_name.as_ref() == Some(&source_file_name) { 
        
                                   None 
        
                               } else { 
        
                                   entry.source_file_name 
        
                               }; 
        
                               BlameEntry { 
        
                                   source_file_name, 
        
                                   ..entry 
        
                               } 
        
                           }) 
        
                           .collect()) 
        
                   } 
        
               } 
        
               impl Iterator for Baseline<'_> { 
        
                   type Item = BlameEntry; 
        
                   fn next(&mut self) -> Option<Self::Item> { 
        
                       let mut ranges = None; 
        
                       let mut commit_id = gix_hash::Kind::Sha1.null(); 
        
                       let mut skip_lines: u32 = 0; 
        
                       let mut source_file_name: Option<gix_object::bstr::BString> = None; 
        
                       for line in self.lines.by_ref() { 
        
                           if line.starts_with(b"\t") { 
        
                               // Each group consists of a header and one or more lines. We break from the 
        
                               // loop, thus returning a `BlameEntry` from `next` once we have seen the number 
        
                               // of lines starting with "\t" as indicated in the group’s header. 
        
                               skip_lines -= 1; 
        
                               if skip_lines == 0 { 
        
                                   break; 
        
                               } else { 
        
                                   continue; 
        
                               } 
        
                           } 
        
                           let fields: Vec<&str> = line.to_str().unwrap().split(' ').collect(); 
        
                           if fields.len() == 4 { 
        
                               // We’re possibly dealing with a group header. 
        
                               // If we can’t parse the first field as an `ObjectId`, we know this is not a 
        
                               // group header, so we continue. This can yield false positives, but for 
        
                               // testing purposes, we don’t bother. 
        
                               commit_id = match ObjectId::from_hex(fields[0].as_bytes()) { 
        
                                   Ok(id) => id, 
        
                                   Err(_) => continue, 
        
                               }; 
        
                               let line_number_in_source_file = fields[1].parse::<u32>().unwrap(); 
        
                               let line_number_in_final_file = fields[2].parse::<u32>().unwrap(); 
        
                               // The last field indicates the number of lines this group contains info for 
        
                               // (this is not equal to the number of lines in git blame’s porcelain output). 
        
                               let number_of_lines_in_group = fields[3].parse::<u32>().unwrap(); 
        
                               skip_lines = number_of_lines_in_group; 
        
                               let source_range = 
        
                                   (line_number_in_source_file - 1)..(line_number_in_source_file + number_of_lines_in_group - 1); 
        
                               let blame_range = 
        
                                   (line_number_in_final_file - 1)..(line_number_in_final_file + number_of_lines_in_group - 1); 
        
                               assert!(ranges.is_none(), "should not overwrite existing ranges"); 
        
                               ranges = Some((blame_range, source_range)); 
        
                           } else if fields[0] == "filename" { 
        
                               // We need to store `source_file_name` as it is not repeated for subsequent 
        
                               // hunks that have the same `commit_id`. 
        
                               source_file_name = Some(fields[1].into()); 
        
                               self.filenames.insert(commit_id, fields[1].into()); 
        
                           } else if !is_known_header_field(&fields[0]) && ObjectId::from_hex(fields[0].as_bytes()).is_err() { 
        
                               panic!("unexpected line: '{:?}'", line.as_bstr()); 
        
                           } 
        
                       } 
        
                       let Some((range_in_blamed_file, range_in_source_file)) = ranges else { 
        
                           // No new lines were parsed, so we assume the iterator is finished. 
        
                           return None; 
        
                       }; 
        
                       Some(BlameEntry::new( 
        
                           range_in_blamed_file, 
        
                           range_in_source_file, 
        
                           commit_id, 
        
                           source_file_name.or_else(|| self.filenames.get(&commit_id).cloned()), 
        
                       )) 
        
                   } 
        
               } 
        
           }

. And then, the test would parse the relevant .baseline file and compare the output of gix_blame::incremental to git’s output. Can you add such baseline tests or do you see any issues with that approach?

Christoph Rüßler (cruessler) · 2026-04-11T07:47:14Z

gix-blame/src/file/mod.rs

+        if !hunk.has_suspect(&suspect) {
+            new_hunks_to_blame.push(hunk);
+            continue;
+        }

-    for mut hunk in hunks_to_blame.into_iter().map(Some) {
-        let mut offset_in_destination = Offset::Added(0);
+        let suspect_range = hunk.get_range(&suspect).expect("has_suspect was true");


Could this also be shortened to?

let Some(suspect_range) = hunk.get_range(&suspect) else { new_hunks_to_blame.push(hunk); continue; };

Christoph Rüßler (cruessler) · 2026-04-11T08:27:42Z

gix-blame/src/file/mod.rs

 ) -> Vec<UnblamedHunk> {
    let mut new_hunks_to_blame = Vec::new();
+    let mut offset_in_destination = Offset::Added(0);
+    let mut changes_pos = 0;


Instead of using changes_pos, would the following also be an option?

// Move these to lines out of the `for` loop let mut changes_iter = changes.iter().cloned(); let mut change = changes_iter.next(); // And then, instead of `changes_pos = 0` changes_iter = changes.iter().cloned(); change = changes_iter.next();

I’m mainly asking because I find it easier to wrap my head around iterators as opposed to manual tracking of indices.

Christoph Rüßler (cruessler) · 2026-04-11T08:30:37Z

gix-blame/src/file/mod.rs

-/// never have two different lines from a *Blamed File* mapped to the same line in a *Source File*,
-/// this property would need to be enforced at a higher level than `process_changes`.
-/// Then the nested loops could potentially be flattened into one.
+/// Uses a cursor to avoid restarting the changes iteration from the beginning for each hunk.


In subsequent_hunks_overlapping_end_of_addition, I think it would make sense to make the following change:

let changes = vec![ Change::AddedOrReplaced(5..9, 4), Change::Unchanged(9..10), Change::AddedOrReplaced(10..14, 0), ];

The test that was written as a small regression test when the original nested loops were introduced currently only has one change, so it doesn’t trigger resetting changes_pos in a meaningful way.

Vicent Martí (vmg) force-pushed the vmg/blame-incremental branch from 21e316d to b65f15b Compare March 6, 2026 17:14

chatgpt-codex-connector bot reviewed Mar 6, 2026

View reviewed changes

gix-commitgraph/src/bloom.rs Outdated Show resolved Hide resolved

Vicent Martí (vmg) force-pushed the vmg/blame-incremental branch 4 times, most recently from 52dd978 to e6b8998 Compare March 6, 2026 17:58

Christoph Rüßler (cruessler) reviewed Mar 8, 2026

View reviewed changes

gix-blame/src/types.rs Outdated Show resolved Hide resolved

gix-blame/tests/blame.rs Outdated Show resolved Hide resolved

LOCI-AI (loci-dev) mentioned this pull request Mar 9, 2026

UPSTREAM PR #2457: gix-blame: Incremental auroralabs-loci/gitoxide#31

Open

Vicent Martí (vmg) added 7 commits March 11, 2026 12:04

core: implement log_file

1be2065

Implement the log_file method in gitoxide-core, which allows performing path-delimited log commands. With the new changed paths bloom filter, it is not possible to perform this operation very efficiently.

gix-blame: allow interrupting an incremental blame

a85c1fe

The BlameSink type now returns a std::ops::ControlFlow value that can be used to interrupt the blame early. Signed-off-by: Vicent Marti <vmg@strn.cat>

Vicent Martí (vmg) force-pushed the vmg/blame-incremental branch from e6b8998 to a85c1fe Compare March 11, 2026 11:04

Sebastian Thiel (Byron) requested a review from Copilot March 25, 2026 12:15

Copilot started reviewing on behalf of Sebastian Thiel (Byron) March 25, 2026 12:16 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

gix-commitgraph/src/lib.rs Show resolved Hide resolved

gix-commitgraph/src/file/init.rs Show resolved Hide resolved

gix-commitgraph/src/bloom.rs Show resolved Hide resolved

gix-blame/src/types.rs Show resolved Hide resolved

gix-blame/src/file/function.rs Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 25, 2026

View reviewed changes

gix-commitgraph/src/bloom.rs Show resolved Hide resolved

Vicent Martí (vmg) force-pushed the vmg/blame-incremental branch from 1e220a3 to 3b7bff2 Compare April 3, 2026 09:39

Vicent Martí (vmg) force-pushed the vmg/blame-incremental branch from 3b7bff2 to 08a1ca9 Compare April 3, 2026 10:15

gix-commitgraph: add support for v1 hashes too

5b94604

Signed-off-by: Vicent Marti <vmg@strn.cat>

Vicent Martí (vmg) force-pushed the vmg/blame-incremental branch from 08a1ca9 to 5b94604 Compare April 3, 2026 11:09

Christoph Rüßler (cruessler) suggested changes Apr 11, 2026

View reviewed changes

	git blame --porcelain simple.txt > .git/simple.baseline
	git blame --porcelain -L 1,2 simple.txt > .git/simple-lines-1-2.baseline
	git blame --porcelain -L 1,2 -L 4 simple.txt > .git/simple-lines-multiple-1-2-and-4.baseline
	git blame --porcelain --since 2025-01-31 simple.txt > .git/simple-since.baseline
	git blame --porcelain multiline-hunks.txt > .git/multiline-hunks.baseline
	git blame --porcelain deleted-lines.txt > .git/deleted-lines.baseline
	git blame --porcelain deleted-lines-multiple-hunks.txt > .git/deleted-lines-multiple-hunks.baseline
	git blame --porcelain changed-lines.txt > .git/changed-lines.baseline
	git blame --porcelain changed-line-between-unchanged-lines.txt > .git/changed-line-between-unchanged-lines.baseline
	git blame --porcelain added-lines.txt > .git/added-lines.baseline
	git blame --porcelain added-lines-around.txt > .git/added-lines-around.baseline
	git blame --porcelain switched-lines.txt > .git/switched-lines.baseline
	git blame --porcelain added-line-before-changed-line.txt > .git/added-line-before-changed-line.baseline
	git blame --porcelain same-line-changed-twice.txt > .git/same-line-changed-twice.baseline
	git blame --porcelain coalesce-adjacent-hunks.txt > .git/coalesce-adjacent-hunks.baseline

	mkdir .git/sub-directory
	git blame --porcelain sub-directory/sub-directory.txt > .git/sub-directory/sub-directory.baseline

	git blame --porcelain after-rename.txt > .git/after-rename.baseline
	git blame --porcelain after-second-rename.txt > .git/after-second-rename.baseline
	git blame --porcelain after-rewrite.txt > .git/after-rewrite.baseline
	git blame --porcelain sub-directory/after-move-to-sub-directory.txt > .git/sub-directory/after-move-to-sub-directory.baseline

	git blame --porcelain resolved-conflict.txt > .git/resolved-conflict.baseline
	git blame --porcelain file-in-one-chain-of-ancestors.txt > .git/file-in-one-chain-of-ancestors.baseline
	git blame --porcelain different-file-in-another-chain-of-ancestors.txt > .git/different-file-in-another-chain-of-ancestors.baseline
	git blame --porcelain file-only-changed-in-branch.txt > .git/file-only-changed-in-branch.baseline
	git blame --porcelain file-changed-in-two-branches.txt > .git/file-changed-in-two-branches.baseline
	git blame --porcelain file-topo-order-different-than-date-order.txt > .git/file-topo-order-different-than-date-order.baseline

	git blame --porcelain empty-lines-histogram.txt > .git/empty-lines-histogram.baseline

	git config --local diff.algorithm myers

	git blame --porcelain empty-lines-myers.txt > .git/empty-lines-myers.baseline

	struct Baseline<'a> {
	lines: bstr::Lines<'a>,
	filenames: BTreeMap<ObjectId, bstr::BString>,
	}

	mod baseline {
	use std::{collections::BTreeMap, path::Path};

	use gix_blame::BlameEntry;
	use gix_hash::ObjectId;
	use gix_ref::bstr::ByteSlice;

	use super::Baseline;

	// These fields are used by `git` in its porcelain output.
	const HEADER_FIELDS: [&str; 12] = [
	// https://github.com/git/git/blob/6258f68c3c1092c901337895c864073dcdea9213/builtin/blame.c#L256-L280
	"author",
	"author-mail",
	"author-time",
	"author-tz",
	"committer",
	"committer-mail",
	"committer-time",
	"committer-tz",
	"summary",
	"boundary",
	// https://github.com/git/git/blob/6258f68c3c1092c901337895c864073dcdea9213/builtin/blame.c#L239-L248
	"previous",
	"filename",
	];

	fn is_known_header_field(field: &&str) -> bool {
	HEADER_FIELDS.contains(field)
	}

	impl Baseline<'_> {
	pub fn collect(
	baseline_path: impl AsRef<Path>,
	source_file_name: gix_object::bstr::BString,
	) -> std::io::Result<Vec<BlameEntry>> {
	let content = std::fs::read(baseline_path)?;
	let baseline = Baseline {
	lines: content.lines(),
	filenames: BTreeMap::default(),
	};

	Ok(baseline
	.map(\|entry\| {
	let source_file_name = if entry.source_file_name.as_ref() == Some(&source_file_name) {
	None
	} else {
	entry.source_file_name
	};

	BlameEntry {
	source_file_name,
	..entry
	}
	})
	.collect())
	}
	}

	impl Iterator for Baseline<'_> {
	type Item = BlameEntry;

	fn next(&mut self) -> Option<Self::Item> {
	let mut ranges = None;
	let mut commit_id = gix_hash::Kind::Sha1.null();
	let mut skip_lines: u32 = 0;
	let mut source_file_name: Option<gix_object::bstr::BString> = None;

	for line in self.lines.by_ref() {
	if line.starts_with(b"\t") {
	// Each group consists of a header and one or more lines. We break from the
	// loop, thus returning a `BlameEntry` from `next` once we have seen the number
	// of lines starting with "\t" as indicated in the group’s header.
	skip_lines -= 1;

	if skip_lines == 0 {
	break;
	} else {
	continue;
	}
	}

	let fields: Vec<&str> = line.to_str().unwrap().split(' ').collect();
	if fields.len() == 4 {
	// We’re possibly dealing with a group header.
	// If we can’t parse the first field as an `ObjectId`, we know this is not a
	// group header, so we continue. This can yield false positives, but for
	// testing purposes, we don’t bother.
	commit_id = match ObjectId::from_hex(fields[0].as_bytes()) {
	Ok(id) => id,
	Err(_) => continue,
	};

	let line_number_in_source_file = fields[1].parse::<u32>().unwrap();
	let line_number_in_final_file = fields[2].parse::<u32>().unwrap();
	// The last field indicates the number of lines this group contains info for
	// (this is not equal to the number of lines in git blame’s porcelain output).
	let number_of_lines_in_group = fields[3].parse::<u32>().unwrap();

	skip_lines = number_of_lines_in_group;

	let source_range =
	(line_number_in_source_file - 1)..(line_number_in_source_file + number_of_lines_in_group - 1);
	let blame_range =
	(line_number_in_final_file - 1)..(line_number_in_final_file + number_of_lines_in_group - 1);
	assert!(ranges.is_none(), "should not overwrite existing ranges");
	ranges = Some((blame_range, source_range));
	} else if fields[0] == "filename" {
	// We need to store `source_file_name` as it is not repeated for subsequent
	// hunks that have the same `commit_id`.
	source_file_name = Some(fields[1].into());

	self.filenames.insert(commit_id, fields[1].into());
	} else if !is_known_header_field(&fields[0]) && ObjectId::from_hex(fields[0].as_bytes()).is_err() {
	panic!("unexpected line: '{:?}'", line.as_bstr());
	}
	}

	let Some((range_in_blamed_file, range_in_source_file)) = ranges else {
	// No new lines were parsed, so we assume the iterator is finished.
	return None;
	};
	Some(BlameEntry::new(
	range_in_blamed_file,
	range_in_source_file,
	commit_id,
	source_file_name.or_else(\|\| self.filenames.get(&commit_id).cloned()),
	))
	}
	}
	}

Uh oh!

Conversation

Vicent Martí (vmg) commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Christoph Rüßler (cruessler) left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Vicent Martí (vmg) commented Mar 11, 2026

Uh oh!

Vicent Martí (vmg) commented Mar 25, 2026

Uh oh!

Sebastian Thiel (Byron) commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Christoph Rüßler (cruessler) commented Mar 25, 2026

Uh oh!

Vicent Martí (vmg) commented Mar 25, 2026

Uh oh!

Sebastian Thiel (Byron) commented Mar 26, 2026

Uh oh!

Christoph Rüßler (cruessler) commented Mar 29, 2026

Uh oh!

Vicent Martí (vmg) commented Apr 3, 2026

Uh oh!

Sebastian Thiel (Byron) commented Apr 6, 2026

Uh oh!

Christoph Rüßler (cruessler) commented Apr 11, 2026

Uh oh!

Christoph Rüßler (cruessler) left a comment

Choose a reason for hiding this comment

Uh oh!

Christoph Rüßler (cruessler) Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Christoph Rüßler (cruessler) Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Christoph Rüßler (cruessler) Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants