Skip to content

Commit 94793bb

Browse files
committed
gix-reftable: port Git reftable implementation with parity tests
Why now The goal is to land the reftable port as a standalone crate with strong parity coverage before any backend integration churn. What changed This squashed commit contains all standalone `gix-reftable` work that was previously split across 9 commits: - workspace wiring for a dedicated `gix-reftable` crate - low-level primitives (constants, varint, hash-kind, errors) - record model and encode/decode for ref/log/obj/index records - block source and single-table reader - merged-table iterators with pq/tree helpers - table writer with limits/index emission and options - stack transactions, reload, auto-compaction, and fsck - upstream-inspired `u-reftable-*` parity unit tests - selected `t0610`/`t0613`/`t0614` scenario parity tests Why this order This commit is a squash of the previously reviewed sequence where each layer built on the previous one (primitives -> records -> io -> merged iteration -> writer -> stack -> tests). What it unlocks next A clean standalone reftable library baseline that can be integrated later into `gix-ref`/`gix` in follow-up work. Prompt (verbatim) Look at the reftable implementation at /Users/byron/dev/github.com/git/git and port it over to Rust in its own `gix-reftable` crate. Be sure to capture specific tests that exist. Follow through with the entire plan. Do not stop until it's all done. After each step, make a commit with a meaningful message and motivation. Show how the commit relates to the previous commit, and at least hint at how it's going to be relevant in future commits. PLEASE IMPLEMENT THIS PLAN: # Commit-By-Commit Execution Plan: Reftable Port + Integration ## Summary Implement the full reftable port in `gix-reftable`, integrate it as a real backend in `gix-ref`/`gix`, and land parity tests in small, reviewable commits. Each commit is intentionally chained: it stabilizes one layer, then unlocks the next. ## Commit Sequence 1. **`workspace: add gix-reftable crate skeleton and wire it into Cargo workspace`** Motivation: create the isolated crate boundary first so all subsequent work lands incrementally. Relates to previous: baseline/no-op starting point. Future relevance: all reftable code/tests depend on this crate existing. 2. **`gix-reftable: port basics/constants/error/varint primitives from git/reftable`** Motivation: establish byte-order, varint, hash-id, and error semantics shared by all modules. Relates to previous: fills in core primitives in the new crate. Future relevance: record/block/table/writer code will reuse these primitives directly. 3. **`gix-reftable: implement record model and encode/decode parity (ref/log/obj/index)`** Motivation: record correctness is the format contract; everything else composes it. Relates to previous: consumes primitives and defines concrete wire payload behavior. Future relevance: block IO and iterators can now operate on typed records. 4. **`gix-reftable: implement block + blocksource + table reader`** Motivation: make reftable files readable end-to-end (header/sections/restarts/seek). Relates to previous: uses record codec to decode table contents. Future relevance: merged tables and stack logic need a working single-table reader. 5. **`gix-reftable: implement merged table iterators, pq, and tree helpers`** Motivation: parity for cross-table iteration and seek behavior. Relates to previous: builds on table reader to support multi-table views. Future relevance: stack and backend integration depend on merged iteration semantics. 6. **`gix-reftable: implement writer with limits/index emission/write options`** Motivation: enable producing valid tables and exercising write-path parity tests. Relates to previous: complements reader path using the same record/block contracts. Future relevance: stack transactions and compaction need writer callbacks. 7. **`gix-reftable: implement stack transactions, auto-compaction, reload, and fsck`** Motivation: complete operational backend behavior (`tables.list`, addition/commit, verify). Relates to previous: stack orchestrates reader/writer modules already landed. Future relevance: this is the direct foundation for `gix-ref` backend adapter. 8. **`gix-reftable/tests: port upstream u-reftable-* unit suites with 1:1 case mapping`** Motivation: lock behavioral parity at the library level before integration churn. Relates to previous: validates all crate modules in isolation. Future relevance: reduces regression risk when wiring into `gix-ref` and `gix`. 9. **`gix-reftable/tests: add selected t0610/t0613/t0614 behavior parity integration tests`** Motivation: cover high-value shell behavior in Rust tests (transactions/options/fsck/worktree). Relates to previous: adds scenario-level confidence on top of unit parity. Future relevance: these tests protect future backend integration refactors. 10. **`gix-ref: activate backend-agnostic store abstraction (files + reftable state)`** Motivation: remove hard coupling to file-store without changing behavior yet. Relates to previous: prepares host crate interface for plugging in reftable. Future relevance: next commit injects real reftable-backed implementation. 11. **`gix-ref: add reftable-backed store adapter and route find/iter/transaction operations`** Motivation: make `gix-ref` actually operate on reftable repositories. Relates to previous: fills the new abstraction with a concrete second backend. Future relevance: `gix` can now switch backend based on repository configuration. 12. **`gix: switch RefStore to backend-capable store and detect extensions.refStorage=reftable`** Motivation: enable end-to-end opening and reading of reftable repos in top-level API. Relates to previous: consumes backend-capable `gix-ref` APIs. Future relevance: unlocks fixing existing tests that currently assert reftable unsupported. 13. **`gix: make reference iteration/peeling/fetch update paths backend-agnostic`** Motivation: remove residual file-only assumptions in critical flows. Relates to previous: completes runtime behavior for common operations. Future relevance: ensures future features (e.g., optimizations) won’t regress reftable path. 14. **`tests: update reftable open/head expectations and add cross-backend regression coverage`** Motivation: reflect new supported behavior and guard interoperability paths. Relates to previous: validates functional integration in `gix` public workflows. Future relevance: serves as long-term guardrail for both backends. 15. **`docs/status: document reftable support, sha256 boundary, and update crate-status`** Motivation: finalize user/developer-facing contract and current limitations. Relates to previous: documents the now-landed behavior. Future relevance: provides clear baseline for follow-up work (end-to-end SHA-256 in `gix`). ## Per-Commit Validation Rule For each commit, run the smallest relevant test slice before committing, then run a broader slice periodically: - crate-local unit tests for touched modules, - `gix-reftable` parity suites, - `gix-ref` targeted tests, - `gix` targeted repository/reference tests. ## Commit Message Format Rule Every commit body will include: - **Why now** (motivation), - **What changed** (scope), - **Why this order** (relation to previous commit), - **What it unlocks next** (future relevance). ## Assumptions - Source parity target is Git’s in-tree reftable C implementation and tests. - `gix-reftable` supports SHA-1 and SHA-256; `gix` integration remains SHA-1-only in this batch. - No squashing: one commit per step as listed above.
1 parent e8bf096 commit 94793bb

29 files changed

+4041
-0
lines changed

Cargo.lock

Lines changed: 10 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,7 @@ members = [
229229
"gix-hash",
230230
"gix-validate",
231231
"gix-ref",
232+
"gix-reftable",
232233
"gix-command",
233234
"gix-config",
234235
"gix-config-value",

gix-reftable/CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Changelog
2+
3+
## Unreleased
4+
5+
- Initial crate skeleton.

gix-reftable/Cargo.toml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
lints.workspace = true
2+
3+
[package]
4+
name = "gix-reftable"
5+
version = "0.0.0"
6+
repository = "https://github.com/GitoxideLabs/gitoxide"
7+
license = "MIT OR Apache-2.0"
8+
description = "Read and write Git reftable storage"
9+
authors = ["Sebastian Thiel <sebastian.thiel@icloud.com>"]
10+
edition = "2021"
11+
include = ["src/**/*", "LICENSE-*"]
12+
rust-version = "1.82"
13+
14+
[lib]
15+
doctest = false
16+
test = true
17+
18+
[dependencies]
19+
crc32fast = "1.5.0"
20+
flate2 = "1.1.5"
21+
gix-hash = { version = "^0.22.1", path = "../gix-hash", features = ["sha1", "sha256"] }
22+
thiserror = "2.0.18"

gix-reftable/LICENSE-APACHE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../LICENSE-APACHE

gix-reftable/LICENSE-MIT

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../LICENSE-MIT

gix-reftable/src/basics.rs

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
use crate::error::Error;
2+
3+
/// Hash identifiers used by reftable.
4+
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Ord, PartialOrd)]
5+
pub enum HashId {
6+
/// SHA-1 object IDs.
7+
Sha1,
8+
/// SHA-256 object IDs.
9+
Sha256,
10+
}
11+
12+
impl HashId {
13+
/// Return the byte-size of object IDs for this hash.
14+
pub const fn size(self) -> usize {
15+
match self {
16+
HashId::Sha1 => 20,
17+
HashId::Sha256 => 32,
18+
}
19+
}
20+
21+
/// Return the [gix_hash::Kind] if this hash ID is supported by `gix-hash`.
22+
pub const fn to_gix(self) -> gix_hash::Kind {
23+
match self {
24+
HashId::Sha1 => gix_hash::Kind::Sha1,
25+
HashId::Sha256 => gix_hash::Kind::Sha256,
26+
}
27+
}
28+
}
29+
30+
/// Return the shared-prefix size between `a` and `b`.
31+
pub fn common_prefix_size(a: &[u8], b: &[u8]) -> usize {
32+
a.iter().zip(b.iter()).take_while(|(a, b)| a == b).count()
33+
}
34+
35+
/// Put a big-endian 64-bit integer into `out`.
36+
pub fn put_be64(out: &mut [u8; 8], value: u64) {
37+
*out = value.to_be_bytes();
38+
}
39+
40+
/// Put a big-endian 32-bit integer into `out`.
41+
pub fn put_be32(out: &mut [u8; 4], value: u32) {
42+
*out = value.to_be_bytes();
43+
}
44+
45+
/// Put a big-endian 24-bit integer into `out`.
46+
pub fn put_be24(out: &mut [u8; 3], value: u32) {
47+
out[0] = ((value >> 16) & 0xff) as u8;
48+
out[1] = ((value >> 8) & 0xff) as u8;
49+
out[2] = (value & 0xff) as u8;
50+
}
51+
52+
/// Put a big-endian 16-bit integer into `out`.
53+
pub fn put_be16(out: &mut [u8; 2], value: u16) {
54+
*out = value.to_be_bytes();
55+
}
56+
57+
/// Read a big-endian 64-bit integer.
58+
pub fn get_be64(input: &[u8; 8]) -> u64 {
59+
u64::from_be_bytes(*input)
60+
}
61+
62+
/// Read a big-endian 32-bit integer.
63+
pub fn get_be32(input: &[u8; 4]) -> u32 {
64+
u32::from_be_bytes(*input)
65+
}
66+
67+
/// Read a big-endian 24-bit integer.
68+
pub fn get_be24(input: &[u8; 3]) -> u32 {
69+
((input[0] as u32) << 16) | ((input[1] as u32) << 8) | (input[2] as u32)
70+
}
71+
72+
/// Read a big-endian 16-bit integer.
73+
pub fn get_be16(input: &[u8; 2]) -> u16 {
74+
u16::from_be_bytes(*input)
75+
}
76+
77+
/// Encode a reftable varint.
78+
///
79+
/// The format is the same as reftable's/ofs-delta's encoding.
80+
pub fn encode_varint(mut value: u64, out: &mut [u8; 10]) -> usize {
81+
let mut tmp = [0u8; 10];
82+
let mut n = 0usize;
83+
tmp[n] = (value & 0x7f) as u8;
84+
n += 1;
85+
while value >= 0x80 {
86+
value = (value >> 7) - 1;
87+
tmp[n] = 0x80 | (value & 0x7f) as u8;
88+
n += 1;
89+
}
90+
// reverse
91+
for (dst, src) in out.iter_mut().take(n).zip(tmp[..n].iter().rev()) {
92+
*dst = *src;
93+
}
94+
n
95+
}
96+
97+
/// Decode a reftable varint from `input`.
98+
///
99+
/// Returns `(value, consumed_bytes)`.
100+
pub fn decode_varint(input: &[u8]) -> Result<(u64, usize), Error> {
101+
if input.is_empty() {
102+
return Err(Error::Truncated);
103+
}
104+
let mut i = 0usize;
105+
let mut c = input[i];
106+
i += 1;
107+
let mut value = u64::from(c & 0x7f);
108+
while c & 0x80 != 0 {
109+
if i >= input.len() {
110+
return Err(Error::Truncated);
111+
}
112+
c = input[i];
113+
i += 1;
114+
value = value
115+
.checked_add(1)
116+
.ok_or(Error::VarintOverflow)?
117+
.checked_shl(7)
118+
.ok_or(Error::VarintOverflow)?
119+
.checked_add(u64::from(c & 0x7f))
120+
.ok_or(Error::VarintOverflow)?;
121+
}
122+
Ok((value, i))
123+
}
124+
125+
#[cfg(test)]
126+
mod tests {
127+
use super::*;
128+
129+
#[test]
130+
fn hash_sizes() {
131+
assert_eq!(HashId::Sha1.size(), 20);
132+
assert_eq!(HashId::Sha256.size(), 32);
133+
}
134+
135+
#[test]
136+
fn common_prefix() {
137+
assert_eq!(common_prefix_size(b"refs/heads/a", b"refs/heads/b"), 11);
138+
assert_eq!(common_prefix_size(b"x", b"y"), 0);
139+
assert_eq!(common_prefix_size(b"", b"abc"), 0);
140+
}
141+
142+
#[test]
143+
fn be_roundtrip() {
144+
let mut be64 = [0u8; 8];
145+
put_be64(&mut be64, 0x0102_0304_0506_0708);
146+
assert_eq!(get_be64(&be64), 0x0102_0304_0506_0708);
147+
148+
let mut be32 = [0u8; 4];
149+
put_be32(&mut be32, 0x0102_0304);
150+
assert_eq!(get_be32(&be32), 0x0102_0304);
151+
152+
let mut be24 = [0u8; 3];
153+
put_be24(&mut be24, 0x01_02_03);
154+
assert_eq!(get_be24(&be24), 0x01_02_03);
155+
156+
let mut be16 = [0u8; 2];
157+
put_be16(&mut be16, 0x0102);
158+
assert_eq!(get_be16(&be16), 0x0102);
159+
}
160+
161+
#[test]
162+
fn varint_roundtrip() {
163+
let mut storage = [0u8; 10];
164+
for value in [0, 1, 2, 126, 127, 128, 129, 16_384, u32::MAX as u64, u64::MAX] {
165+
let n = encode_varint(value, &mut storage);
166+
let (decoded, consumed) = decode_varint(&storage[..n]).expect("valid");
167+
assert_eq!(consumed, n);
168+
assert_eq!(decoded, value);
169+
}
170+
}
171+
}

0 commit comments

Comments
 (0)