perf(fs): O_DIRECT foundation — AlignedBuf + FsOpenOptions::direct_io (#133 phase 2)#310
Conversation
Heap-allocated byte buffer with caller-specified alignment. Vec<u8> defaults to align_of::<u8>() = 1, which violates O_DIRECT's typical 4 KiB userspace-buffer alignment requirement (EINVAL on unaligned write/read). AlignedBuf allocates via Layout::from_size_align with the requested boundary, rounds capacity up to a multiple of alignment, and exposes a Vec-like surface (len/capacity/as_slice/spare_capacity_mut/ set_len/clear) plus raw ptr accessors for kernel handoff. Zero-capacity allocations use a dangling NonNull cast from the alignment value, matching the std convention for empty owned buffers. Foundation only; no callers wired yet — direct_io path lands in subsequent commits on this branch. Part of #133
…s/IoUringFs Adds a `direct_io: bool` field + builder method to FsOpenOptions and threads it into the open path of both StdFs and IoUringFs. On Linux and Android (x86, x86_64, aarch64, riscv32/64, loongarch64, s390x — the arches where asm-generic/fcntl.h's O_DIRECT=0o40000 is the authoritative value) the flag becomes a `custom_flags(O_DIRECT)` on the std OpenOptions builder. O_DIRECT is declared as a named constant rather than pulled from libc — matches the EXDEV / flock pattern already established in std_fs.rs, keeps the crate libc-free, and lets the constant carry its own comment documenting the asm-generic source. Architectures with a divergent O_DIRECT bit (arm 0o200000, mips 0o100000, parisc, sparc) are NOT gated on purpose: emitting the wrong bit silently would be worse than honouring the doc'd "direct_io may be ignored" contract. macOS, Windows, and other Unix targets honour the doc contract by falling through to a cached open — F_NOCACHE on macOS and FILE_FLAG_NO_BUFFERING on Windows would each need their own opt-in plumbing and are out of scope for this hook. MemFs ignores `direct_io` (in-memory; the flag has no meaning). No consumers wired yet — callers landing in subsequent commits. Part of #133
|
Warning Review limit reached
Your plan currently allows 1 review/hour. Refill in 41 minutes and 37 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR introduces infrastructure for opt-in ChangesO_DIRECT Aligned I/O Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR adds the foundational pieces needed to support Linux O_DIRECT (page-cache bypass) in the filesystem abstraction layer, by introducing an aligned heap buffer and wiring a direct_io hint into FsOpenOptions for Linux/Android targets where O_DIRECT’s flag value is stable.
Changes:
- Add
AlignedBuf, a heap-allocated, caller-aligned byte buffer intended forO_DIRECT-compatible I/O. - Extend
FsOpenOptionswith adirect_io: boolfield and builder method. - Apply
O_DIRECTviaOpenOptionsExt::custom_flagsinStdFs::openandIoUringFs::open(arch-gated), and update open-options tests.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/fs/aligned_buf.rs | New AlignedBuf type providing aligned, zero-initialized heap storage and unit tests. |
| src/fs/mod.rs | Expose AlignedBuf from fs and add FsOpenOptions::direct_io with documentation + builder. |
| src/fs/std_fs.rs | Implement direct_io by conditionally setting O_DIRECT on OpenOptions (arch + OS gated); update tests. |
| src/fs/io_uring_fs.rs | Mirror direct_io handling for the io_uring backend via custom_flags(O_DIRECT) (arch gated). |
Three doc-only corrections raised in PR review: - AlignedBuf: invariant said "power-of-two multiple of alignment" but new_zeroed only rounds up to the next multiple of alignment (e.g. capacity=9000 → 3 × 4096 = 12288). Reword to match actual guarantee. - AlignedBuf::new_zeroed example was marked `ignore` with a stale comment about pub(crate) visibility; the type is in fact `pub` + re-exported from lsm_tree::fs. Promote the example to a real doctest so it actually runs. - FsOpenOptions::direct_io doc said the flag is a no-op only on non-Linux platforms; in reality it's also dropped on Linux architectures with a divergent O_DIRECT bit (arm, mips, parisc, sparc). Spell out the arch gate + restate \"this is a hint\" so callers don't assume cache bypass is in effect on every Linux target.
…docs Three review follow-ups: 1. `AlignedBuf` invariant doc: the `ptr` invariant claimed an always-valid allocation, but `new_zeroed(0, _)` intentionally returns a dangling sentinel synthesised from the requested alignment. Split the invariant into the capacity > 0 and capacity == 0 cases so the safety story matches the code. 2. `new_zeroed` zero-capacity comment said "0-byte allocation is undefined for the global allocator" — slightly imprecise: `Layout::from_size_align(0, _)` itself succeeds; what's UB is calling `alloc::alloc::alloc(layout)` with size==0 (per its trait doc). Reword to spell that out + cite std's own `Vec` / `NonNull::dangling()` precedent for the sentinel choice. 3. Extracted the `O_DIRECT` `OpenOptions::custom_flags` block into a new `fs::direct_io` module with one `apply_direct_io_flag` fn. `StdFs::open` and `IoUringFs::open` both call it instead of keeping their own copies of the arch-gating list + `O_DIRECT` constant. Removes the drift risk Copilot flagged (if one backend was updated to add a new arch and the other wasn't, `direct_io` would silently work for one and not the other on the same target). No behaviour change.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/fs/aligned_buf.rs`:
- Around line 296-300: The test new_zeroed_returns_zeroed_memory uses an unsafe
from_raw_parts of buf.as_ptr() and buf.capacity() without a SAFETY comment;
replace that unsafe raw-slice construction with the safe API
spare_capacity_mut() (or, if you must keep unsafe, add a // SAFETY: comment
explaining the invariants) so the test reads the buffer safely and asserts all
bytes are zero; refer to AlignedBuf::new_zeroed, AlignedBuf::spare_capacity_mut,
as_ptr and capacity to locate the code to change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 073bd3b3-d89b-4afc-a1aa-b951f669ce61
📒 Files selected for processing (5)
src/fs/aligned_buf.rssrc/fs/direct_io.rssrc/fs/io_uring_fs.rssrc/fs/mod.rssrc/fs/std_fs.rs
Two review follow-ups, resolved as one change because they
concern the same method:
- The method returns a mutable slice over the FULL capacity, not
just the tail beyond len. Vec's / BytesMut's `spare_capacity` is
the tail (len..capacity), so the old name implied semantics the
method doesn't have. Rename to `as_capacity_mut` so the name
matches the actual surface ("full buffer mut"), and spell out
in the doc-comment WHY we expose the whole allocation:
`O_DIRECT` reads need to overwrite already-buffered bytes when
refilling a recycled buffer.
- `new_zeroed_returns_zeroed_memory` was reaching into the buffer
via `unsafe { slice::from_raw_parts(buf.as_ptr(), buf.capacity()) }`
with no SAFETY comment. Swap to the safe `as_capacity_mut()` API
— same observable behaviour, no unsafe in tests.
Spell out that the entire fs::* backend is std-bound today and that the gate-unit is the whole backend (tracked under no-std migration epic #274), not individual sub-modules. Gating direct_io alone while its consumers (StdFs, IoUringFs) stay ungated would be a no-op — std_fs would drag it in transitively. Documents the design choice in the module header so the same question doesn't get raised again.
… site Add a short comment next to `mod direct_io;` pointing at the full module-header explanation in direct_io.rs. The previous commit documented the design choice inside the module; this commit puts a signpost where reviewers reading fs/mod.rs first land, so they do not need to chase the rationale across files.
…doc link fix Three review follow-ups bundled because each is one-or-two lines and they share the same module surface: - `FsOpenOptions` gains `#[non_exhaustive]`. The new `direct_io` field landed in this branch is already a breaking change for downstream struct-literal callers; marking the struct non_exhaustive in the same release confines the break to exactly one release cycle and lets future fields land as semver-minor. Builder API was already complete for every field (`.read()`, `.write()`, …, `.direct_io()`), so the only callers affected are those bypassing the builder. - `direct_io.rs` module docs no longer link `[IoUringFs]` via intra-doc reference — that link breaks rustdoc when the `io-uring` feature is off (the type is `#[cfg]`-gated). Plain inline-code formatting (`` `IoUringFs` ``) renders identically in all feature configurations. - `AlignedBuf::new_zeroed` zero-capacity sentinel is now built via `core::ptr::without_provenance_mut(alignment)` instead of the raw `alignment as *mut u8` cast. Same observable address, but the API explicitly signals "address-only, no provenance, no associated allocation" — exactly what the never-dereferenced sentinel needs, and stays clean under strict-provenance lints.
Document the rationale next to the attribute itself so the reasoning lives at the declaration site (not only in release notes or the PR description). Pairs with the existing inline note on `mod direct_io;` that explains why that submodule is not cfg-gated.
First concrete step toward making the `fs::*` backend honest about
its std-bound surface. The wider Fs / FsFile trait + std_fs /
io_uring_fs still pull std::io::{Read, Write, Seek} and
std::path::Path unconditionally (no core::* equivalents), so this
single gate does not unblock the no-std build by itself — that
remains tracked under #274. But the `direct_io` submodule is purely
a new std-side helper, so gating it now keeps the new addition
honest with the policy and signals intent at the declaration site.
Default-features (std) build path is unchanged.
Earlier rounds documented \"not gated, gate-unit is whole backend\". The submodule is now actually gated behind feature = \"std\", so the old prose contradicted the code. Rewrite both touch points (module header in direct_io.rs and the inline comment in fs/mod.rs) to match: the gate IS in place as the first concrete step; full no-std viability is blocked on porting the Fs/FsFile traits off std::io + std::path (now tracked as #311), itself part of the no-std migration epic (#274).
…e field
Both intra-doc links in direct_io.rs (module header + apply_direct_io_flag
no-op variant) used the bare `FsOpenOptions::direct_io` path, which is
ambiguous because the struct has both a field and a same-named builder
method. Rustdoc would resolve to the method and hide the actual best-
effort contract that's documented on the field. Switch to the
`field@crate::fs::FsOpenOptions::direct_io` disambiguator so readers
land on the contract docs.
The apply.rs variant also corrected the path: it used `super::` from
inside the inner `mod apply { ... }`, which resolved to the wrong
parent (direct_io rather than fs). The absolute `crate::fs::*` form
makes the link work from either nesting level.
…'s std feature mod direct_io; in fs/mod.rs is gated behind #[cfg(feature = "std")], but the two callers (StdFs::open, IoUringFs::open) referenced it unconditionally. Under `--no-default-features --features alloc` the submodule is absent, so each call site added one extra resolution error on top of the type-checking errors std_fs / io_uring_fs already incur from their own std::* usage. Add matching #[cfg(feature = "std")] on both call sites so feature combinations stay coherent. io_uring_fs's whole module is already gated by cfg(all(target_os = "linux", feature = "io-uring")), and io-uring transitively pulls std (see Cargo.toml), so the new predicate is redundant in isolation — kept for symmetry with StdFs and to make the dependency on direct_io's gate explicit at the call site, without requiring auditors to chase the implication through the manifest.
Summary
Foundation layer for
O_DIRECTcache-bypass I/O. Two pieces:AlignedBuf(src/fs/aligned_buf.rs) — heap-allocated byte buffer with caller-specified alignment.Vec<u8>defaults toalign_of::<u8>() = 1, which violatesO_DIRECT's typical 4 KiB userspace-buffer alignment (EINVAL on unaligned write/read).AlignedBufallocates viaLayout::from_size_align, rounds capacity up to a multiple of alignment, and exposes aVec-like surface (len/capacity/as_slice/as_capacity_mut/set_len/clear) + raw ptr accessors for kernel handoff.FsOpenOptions::direct_io— newboolfield + builder method. On Linux/Android (arches whereasm-generic/fcntl.h'sO_DIRECT = 0o40000is the authoritative value: x86, x86_64, aarch64, riscv32/64, loongarch64, s390x) the flag becomes acustom_flags(O_DIRECT)on the stdOpenOptionsbuilder, in bothStdFs::openandIoUringFs::open.O_DIRECTis declared as a named constant rather than pulling inlibc— matches the existingEXDEV/flockpattern instd_fs.rs. Architectures with a divergentO_DIRECTbit (arm0o200000, mips0o100000, parisc, sparc) are not gated on purpose: emitting the wrong bit silently would be worse than honouring the documented "direct_iomay be ignored" contract. macOS / Windows / other Unix targets fall through to a cached open for the same reason —F_NOCACHE/FILE_FLAG_NO_BUFFERINGeach need their own opt-in plumbing.Design choices (explicit so review does not re-raise)
FsOpenOptionsis#[non_exhaustive]. This is breaking (struct-literal callers + exhaustive pattern matches stop compiling) but only relative to the same release that introduces the newdirect_iofield — which is already breaking for the same callers. Bundling both changes in one release confines the break to a single semver-major bump (v5.0.0, queued in PR chore: release v5.0.0 #272) and lets every future field land as semver-minor. Builder methods (.read(),.write(), ….direct_io()) cover every field, so non-struct-literal callers are unaffected.fs::direct_iosubmodule IS gated behind#[cfg(feature = "std")]. It usesstd::fs::OpenOptionsand is therefore std-only. The gate is the first concrete step toward honestly feature-gating the std backend per crate policy. However, it does NOT by itself unblock ano_std + allocbuild: the rest of thefs::*backend — theFs/FsFiletrait definitions and all impls (std_fs,io_uring_fs, evenMemFs) — still referencesstd::io::{Read, Write, Seek}+std::path::Pathdirectly, and those have nocore::*equivalents. Porting the trait surface offstd::io/std::pathis a structural prerequisite tracked as a separate prerequisite issue (refactor(fs): port Fs/FsFile traits off std::io + std::path to unblock no-std backend gating #311), itself part of the no-std migration epic (no-std + alloc migration: full crate #274). Theno-std-checkCI job iscontinue-on-errorfor exactly this reason — net error count is not changed by this PR.What's NOT in this PR
The Tree-level
Config::direct_ioknob and compaction-writer integration are deferred to a follow-up. The current SST writer usesVec<u8>buffers and unaligned block sizes; wiringdirect_iointoConfigwithout an alignment-aware writer would EINVAL on first write. Foundation lands here so the writer refactor can build on it.Test plan
AlignedBuf(alignment verification, capacity rounding, zero init, rejects non-power-of-two / excessive alignment, zero-capacity dangling sentinel,set_lengrowth + panic,clear,as_capacity_mutwrites,Send/Synccompile check, pointer stability)fs_open_options_default+fs_open_options_buildersupdated to cover the new fieldAlignedBuf::new_zeroedexample compiles and passesRelated
Fs/FsFiletraits offstd::io+std::path(prerequisite for the rest offs::*to become honestly feature-gateable)Part of #133