Skip to content

feat: wire batch_size_bytes to Python and public Rust API#6428

Draft
westonpace wants to merge 3 commits intolance-format:mainfrom
westonpace:feat/byte-sized-batches-file-reader
Draft

feat: wire batch_size_bytes to Python and public Rust API#6428
westonpace wants to merge 3 commits intolance-format:mainfrom
westonpace:feat/byte-sized-batches-file-reader

Conversation

@westonpace
Copy link
Copy Markdown
Member

Summary

Stacked on #6388. Please merge that PR first.

  • Adds batch_size_bytes: Option<u64> to FileReaderOptions and propagates it through all 6 SchedulerDecoderConfig creation sites in the file reader
  • Adds batch_size_bytes field + setter to Scanner, wired through both scan_fragments (via LanceScanConfig) and pushdown_scan (via FileReaderOptions in ScanConfig)
  • Adds batch_size_bytes to LanceScanConfig, with try_new_v2 injecting it into FragReadConfig via FileReaderOptions
  • Exposes batch_size_bytes in the Python API: LanceDataset.scanner(), to_table(), to_batches(), ScannerBuilder

Test plan

  • cargo check -p lance-file -p lance --tests — clean
  • cargo clippy -p lance-file -p lance --tests -- -D warnings — clean
  • cargo fmt --all — applied
  • cargo test -p lance-encoding -- byte_sized — 3/3 pass
  • cargo test -p lance -- test_scan — 38/38 pass

🤖 Generated with Claude Code

@github-actions github-actions bot added enhancement New feature or request python labels Apr 7, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 56.06061% with 29 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/scanner.rs 39.28% 15 Missing and 2 partials ⚠️
rust/lance/src/io/exec/filtered_read.rs 41.66% 5 Missing and 2 partials ⚠️
rust/lance/src/io/exec/scan.rs 76.92% 1 Missing and 2 partials ⚠️
rust/lance-file/src/reader.rs 84.61% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

The encoding layer already supports byte-based batching via
SchedulerDecoderConfig.batch_size_bytes but all callers hardcoded it to
None. This wires the parameter through FileReaderOptions, Scanner,
LanceScanConfig, and the Python bindings so users can specify it when
scanning a dataset.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@westonpace westonpace force-pushed the feat/byte-sized-batches-file-reader branch from f94ed45 to 9024fa5 Compare April 8, 2026 14:16
westonpace and others added 2 commits April 8, 2026 07:54
Instead of threading batch_size_bytes individually through LanceScanConfig
and FilteredReadOptions, pass the full FileReaderOptions bundle so future
options flow through automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… levels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant