Skip to content

Add pylibCZIrw reader for CZI support#58

Merged
Mr-Milk merged 7 commits into
rendeirolab:mainfrom
john-mulvey:feat/pylibczi-reader
Apr 30, 2026
Merged

Add pylibCZIrw reader for CZI support#58
Mr-Milk merged 7 commits into
rendeirolab:mainfrom
john-mulvey:feat/pylibczi-reader

Conversation

@john-mulvey
Copy link
Copy Markdown
Contributor

@john-mulvey john-mulvey commented Apr 12, 2026

Summary

Adds PylibCZIReader, a new reader backend built on
pylibCZIrw, Zeiss's officially
maintained Python binding to libCZI. BioFormats cannot decode
JPEG-XR compressed CZI files on arm64 macOS because its ome:jxrlib
native library has no arm64 build
(ome/bioformats#3858,
open since 2022 with no upstream resolution in sight); pylibCZIrw does
not have this limitation.

Motivation

Neither of the existing readers can handle .czi files on arm64 macOS:

  • OpenSlide does not support the CZI format on any platform.
  • BioFormats can parse CZI in principle, but the JPEG-XR codec used
    by Zeiss brightfield scans is provided through ome:jxrlib, whose
    native library has no arm64 macOS build and is not scheduled to get
    one. Any attempt to read a real CZI file on Apple Silicon therefore
    fails at the decode step.

pylibCZIrw is Zeiss's official maintained binding to libCZI and ships
cross-platform wheels including arm64 macOS, so it plugs the gap
cleanly without pulling in a JVM.

What this PR adds

  • New reader: wsidata/reader/pylibczi.py implementing
    PylibCZIReader(ReaderBase) with name = "pylibczi" and
    pkg_namespaces = "pylibCZIrw", following the same conventions as
    the other backends.
  • Registry: "pylibczi" is added to the auto-detect priority list
    in wsidata/reader/_reader_registry.py, the class is re-exported
    from wsidata/reader/__init__.py, and open_wsi's reader
    Literal type and docstring are extended to list the new backend.
  • Optional dependency: a new pylibczi extra in pyproject.toml
    plus a dev-group entry so CI picks it up.
  • Tests: a test_pylibczi case mirroring test_isyntax, and a
    test_czi fixture in tests/conftest.py. The fixture skips cleanly
    via EntryNotFoundError until a .czi asset is uploaded to
    RendeiroLab/LazySlide-data - see the Test fixture section
    below.
  • Docs: a pylibCZIrw tab-item on the installation page and an
    autosummary entry on the readers API page.

Design decisions

Priority placement before bioformats

pyczi.open_czi raises on any input that is not a valid CZI file, so
READERS.try_open falls straight through to the next backend for any
other format. Placing pylibczi before bioformats therefore has
no cost for non-CZI inputs, and it avoids a silent footgun on arm64
macOS where a .czi file would otherwise auto-select BioFormats and
fail at the first read. The intent is spelled out in a comment above
the priority list in _reader_registry.py.

Empirically verified: non-CZI inputs (wrong magic, empty file,
missing path) all raise a plain RuntimeError from the C++ bindings,
which try_open's except Exception catches cleanly.

Bgr24-only, with an explicit NotImplementedError

The initial implementation supports the Bgr24 pixel type only, which
is what Zeiss microscopes produce for brightfield H&E scans - the
primary use case for an H&E-focused toolkit like wsidata/lazyslide.
Any other pixel type raises NotImplementedError with a message
directing the user to open a follow-up issue, so we fail loudly rather
than silently returning incorrectly-decoded pixels. Broadening the set
of supported pixel types is straightforward follow-up work once a
test fixture exists for each.

Multi-scene CZIs warn rather than fail

Multi-scene CZIs are read as scene 0 with a UserWarning naming the
total scene count. A scene=N parameter is deliberately not added:
the shape of a multi-scene API should be decided once across all
wsidata readers, not set unilaterally here.

Synthetic pyramid via pylibCZIrw's zoom parameter

pylibCZIrw does not expose pre-baked pyramid levels, so the reader
presents a synthetic pyramid of six levels (1x, 2x, 4x, 8x,
16x, 32x) by setting the appropriate zoom value on every
read_region call. This is the sanctioned way to obtain
lower-resolution views from pylibCZIrw, and it matches how raw .czi
files from Zeiss microscopes are typically consumed (they are often
not pre-tiled).

Coordinate translation from CZI absolute origin to zero origin

CZI files store coordinates in an absolute reference frame whose
origin can be far from (0, 0). The reader records
scenes_bounding_rectangle[0].x, .y at construction time and
translates every get_region request back into the CZI absolute
frame, so read_region(0, 0, ...) returns the top-left of the scene.

BGR-to-RGB conversion

pylibCZIrw returns raw BGR for Bgr24, not the RGBA layout assumed
by ReaderBase.convert_image, so get_region does its own
cv2.cvtColor - see the inline comment.

Reader lifecycle via context manager

pylibCZIrw exposes open_czi only as a context manager, so
PylibCZIReader holds an __enter__/__exit__ pair on the instance
and drives them from create_reader/detach_reader. A code comment
flags the pattern so it is not "simplified" away.

Incidental fix: FastSlideReader registration

While testing the auto-detect priority walk against a real .czi
file, I found that open_wsi() raised
KeyError: "Cannot find reader 'fastslide' in registry." on every
auto-detect, even when fastslide was not installed. Root cause:
FastSlideReader is listed in the priority order but its class
definition was missing the @register(name="fastslide") decorator,
so the registry lookup inside try_open fails before it can fall
through to the next backend. This PR adds the missing decorator (and
its import) to wsidata/reader/fastslide.py. Happy to split this
into a separate PR if maintainers prefer.

Test fixture

The test_pylibczi case expects a sample.czi fixture in the
RendeiroLab/LazySlide-data
HuggingFace dataset, mirroring how sample.svs is already used for
the OpenSlide and TiffSlide tests.

The natural candidate is
c1_bgr24.czi,
a ~530 KB single-channel 24-bit BGR file from Zeiss's own pylibCZIrw
test suite - a known-clean Bgr24 CZI matching this reader's
supported pixel type.

Licence caveat. pylibCZIrw is dual-licensed under LGPL-3.0/GPL-3.0
and I could not find an explicit re-distribution statement for its
test_data/; maintainers may want to confirm with Zeiss before
uploading.

Until the fixture lands, test_pylibczi skips cleanly via
hf_hub_download raising EntryNotFoundError, which the fixture
catches with pytest.skip(...). CI remains green.

I have smoke-tested the reader locally against c1_bgr24.czi, and
separately against a real H&E brightfield .czi acquired in the
course of the project this work grew out of. I have also verified
pixel-level parity against the standalone pylibCZIrw-based reader I
originally wrote for that project, from which this backend is
ported.

Verification

Run locally on arm64 macOS with Python 3.11, inside a clean
uv sync --dev environment plus uv pip install pylibCZIrw:

  • uv run task fmt - clean.
  • uv run ruff check on the modified files - clean (see note on
    unrelated upstream issues below).
  • uv run task test - 47 passed, 3 skipped. test_pylibczi skips
    cleanly because the HF fixture does not yet exist; test_cucim
    skips because cucim is not installed on macOS; one other pre-existing
    skip is unrelated.
  • End-to-end smoke test via open_wsi("c1_bgr24.czi"): auto-detect
    correctly walks the priority list, picks PylibCZIReader, and
    returns the expected shape=[256, 256], n_level=6,
    mpp=2.496, plus a valid (64, 64, 3) region and
    (128, 128, 3) thumbnail.

Test plan

  • Upload c1_bgr24.czi (or another small Bgr24 CZI) as
    sample.czi to RendeiroLab/LazySlide-data, subject to
    licence review.
    Download from GitHub directly
  • Confirm uv run task test runs test_pylibczi green once the
    fixture is live.
  • Confirm CI passes across the existing matrix. pylibCZIrw 6.0.1
    publishes cp311/cp312/cp313 wheels for macOS arm64, Linux
    x86_64, Linux aarch64, and Windows x86_64, so every cell of
    the current {3.11, 3.12, 3.13} x {ubuntu, macos, windows} CI
    matrix should get a prebuilt wheel. There is no macOS x86_64
    wheel, but the CI uses macos-latest which is arm64, so this
    does not affect the matrix.
  • Confirm the rendered docs build (uv run task doc-build) and
    the new reader appears on the readers API page and
    installation page.

Closes #59

FastSlideReader is listed in ReaderRegistry.priority but its class
definition was missing the @register(name="fastslide") decorator,
so open_wsi() auto-detect raised KeyError on every call, even when
fastslide was not installed. Adds the missing decorator and import.
Adds a new PylibCZIReader backend built on pylibCZIrw, Zeiss's
officially maintained Python binding to libCZI. BioFormats cannot
decode JPEG-XR compressed CZI files on arm64 macOS because its
ome:jxrlib native library has no arm64 build; pylibCZIrw does not
have this limitation.

The reader is placed ahead of bioformats in the auto-detect
priority list because pyczi.open_czi raises on any non-CZI input,
so a .czi file on arm64 macOS would otherwise auto-select
BioFormats and fail at first read.
@Mr-Milk Mr-Milk self-assigned this Apr 18, 2026
@Mr-Milk Mr-Milk changed the title Add pylibCZIrw reader for CZI support on arm64 macOS Add pylibCZIrw reader for CZI support Apr 30, 2026
@Mr-Milk
Copy link
Copy Markdown
Member

Mr-Milk commented Apr 30, 2026

@john-mulvey Thanks a lot for your contribution on the CZI reader, all the tests passed! I will now merge it.

@Mr-Milk Mr-Milk merged commit b62ed59 into rendeirolab:main Apr 30, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large .czi breaks java.bioformats

2 participants