Skip to content

feat: support distributed bitmap index builds#6429

Open
butnaruandrei wants to merge 2 commits intolance-format:mainfrom
butnaruandrei:distributed-bitmap-index-build
Open

feat: support distributed bitmap index builds#6429
butnaruandrei wants to merge 2 commits intolance-format:mainfrom
butnaruandrei:distributed-bitmap-index-build

Conversation

@butnaruandrei
Copy link
Copy Markdown

@butnaruandrei butnaruandrei commented Apr 7, 2026

Allows bitmap indexes to be built in a distributed/fragment-scoped manner,
consistent with the existing BTree and FTS distributed build patterns.

  • BitmapIndexPlugin::train_index now accepts fragment_ids, writing
    per-shard part_{partition_id}_bitmap_page_lookup.lance files instead
    of erroring when fragment_ids is set
  • merge_index_files performs a K-way sorted merge of shard files,
    unioning RowAddrTreeMaps for matching keys, then writes the canonical
    bitmap_page_lookup.lance and deletes the shards
  • Dataset::merge_index_metadata dispatches IndexType::Bitmap to the
    new merger
  • Python SupportedDistributedIndices gains a BITMAP variant

Non-distributed builds (no fragment_ids) are unchanged. After merge the
on-disk layout is identical to a monolithic build.

…omponents

- Updated `LanceDataset` to merge Bitmap index files alongside existing index types.
- Introduced `BITMAP` as a supported index type in `SupportedDistributedIndices`.
- Added a new test for building and merging Bitmap indices, ensuring compatibility with existing data.
- Enhanced Rust implementation to handle Bitmap index creation and merging.

This update expands the indexing capabilities of the Lance framework, allowing for more efficient data retrieval and management.
@butnaruandrei butnaruandrei changed the title feat: add support for Bitmap index type in LanceDataset and related c… feat: add support for Bitmap index type in LanceDataset Apr 7, 2026
@github-actions github-actions bot added enhancement New feature or request python labels Apr 7, 2026
@butnaruandrei butnaruandrei changed the title feat: add support for Bitmap index type in LanceDataset feat: add support for distributed Bitmap index type in LanceDataset Apr 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@butnaruandrei butnaruandrei changed the title feat: add support for distributed Bitmap index type in LanceDataset feat: add support for distributed Bitmap index type in LanceDataset Apr 7, 2026
@butnaruandrei butnaruandrei changed the title feat: add support for distributed Bitmap index type in LanceDataset feat: support distributed bitmap index builds Apr 7, 2026
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! However we have designed a new general distributed API and pipeline as showed up here: https://lance.org/guide/distributed_indexing/

Whenever possible, please adapt the index build API to this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants