Skip to content

Add Zarr support to scanpy/pca#11697

Open
ehsanestaji wants to merge 4 commits into
nf-core:masterfrom
ehsanestaji:scverse-zarr-scanpy-pca
Open

Add Zarr support to scanpy/pca#11697
ehsanestaji wants to merge 4 commits into
nf-core:masterfrom
ehsanestaji:scverse-zarr-scanpy-pca

Conversation

@ehsanestaji
Copy link
Copy Markdown
Contributor

@ehsanestaji ehsanestaji commented May 19, 2026

PR checklist

Part of #11559

This PR adds Zarr input/output support to scanpy/pca as a first scverse module slice.

Changes:

  • Detect .zarr inputs and emit a matching .zarr AnnData output directory.
  • Preserve existing .h5ad behavior for .h5ad inputs.
  • Add zarr as an optional output channel and document it in meta.yml.
  • Add a small AnnData Zarr fixture plus stub and real Zarr tests.
  • Broadcast versions.yml on topic: versions.
  • Avoid importing scanpy in the stub version command by using importlib.metadata.version("scanpy").

Validation run locally on 2026-05-19:

  • conda run -n nfcore-topic nf-core modules lint scanpy/pca --local --plain-text

  • conda run -n nfcore-topic prek run --files $(git diff --name-only)

  • conda run -n nfcore-topic nf-test test modules/nf-core/scanpy/pca/tests/main.nf.test --profile docker --filter process --verbose

  • conda run -n nfcore-topic nf-test test modules/nf-core/scanpy/pca/tests/main.nf.test --profile conda --filter process --verbose

  • git diff --check

  • This comment contains a description of changes (with reason).

  • If you've fixed a bug or added code that should be tested, add tests!

  • If you've added a new tool - have you followed the module conventions in the contribution docs

  • If necessary, include test data in your PR.

  • Remove all TODO statements.

  • Broadcast software version numbers to topic: versions - See version_topics

  • Follow the naming conventions.

  • Follow the parameters requirements.

  • Follow the input/output options guidelines.

  • Add a resource label

  • Use BioConda and BioContainers if possible to fulfil software requirements.

  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:

    • For modules:
      • nf-core modules test scanpy/pca --profile docker
      • nf-core modules test scanpy/pca --profile singularity
      • nf-core modules test scanpy/pca --profile conda

@ehsanestaji
Copy link
Copy Markdown
Contributor Author

Follow-up pushed in 774206f. This addresses the local/CI issues I could reproduce: versions.yml now broadcasts to topic: versions, the Zarr fixture metadata files have final newlines for pre-commit, and the h5ad test now checks output presence/content instead of MD5 snapshots that differ across runtimes.\n\nLocal validation passed on 2026-05-19:\n- conda run -n nfcore-topic nf-core modules lint scanpy/pca --local --plain-text\n- conda run -n nfcore-topic prek run --files $(git diff --name-only)\n- conda run -n nfcore-topic nf-test test modules/nf-core/scanpy/pca/tests/main.nf.test --profile docker --filter process --verbose\n- conda run -n nfcore-topic nf-test test modules/nf-core/scanpy/pca/tests/main.nf.test --profile conda --filter process --verbose\n- git diff --check\n\nThe GitHub workflow runs for this new commit currently show action_required before jobs start, so a maintainer may need to approve them when convenient.

@ehsanestaji ehsanestaji force-pushed the scverse-zarr-scanpy-pca branch from 774206f to 71bbc5a Compare May 19, 2026 21:51
@ehsanestaji ehsanestaji marked this pull request as ready for review May 19, 2026 21:52
@ehsanestaji
Copy link
Copy Markdown
Contributor Author

Update: rebased this PR onto current nf-core/modules:master and marked it ready for review. Latest head is 71bbc5a11.\n\nFresh local validation after the rebase:\n- conda run -n nfcore-topic nf-core modules lint scanpy/pca --local --plain-text\n- conda run -n nfcore-topic prek run --files $(git diff --name-only upstream/master...HEAD)\n- conda run -n nfcore-topic nf-test test modules/nf-core/scanpy/pca/tests/main.nf.test --profile docker --filter process --verbose\n- git diff --check\n\nThe GitHub workflows are still awaiting maintainer approval before they can run.

Copy link
Copy Markdown
Contributor

@SPPearce SPPearce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to have some files in the tests folder that aren't required.
Is there a zarr file that we could test on? But from the test-datasets.

@ehsanestaji ehsanestaji force-pushed the scverse-zarr-scanpy-pca branch from 71bbc5a to e349fc3 Compare May 24, 2026 16:12
@ehsanestaji
Copy link
Copy Markdown
Contributor Author

Moved the Zarr fixture out of the module tests directory and updated the tests to use nf-core/test-datasets via params.modules_testdata_base_path. Since Zarr is directory-backed, the fixture is stored in test-datasets as a small .zarr.tar.gz archive and unpacked with UNTAR during nf-test setup.

Prerequisite test-data PR: nf-core/test-datasets#2076. I validated the module tests locally against that forked test-data branch; the official test-data URL will work once that PR is merged.

@SPPearce
Copy link
Copy Markdown
Contributor

Can I suggest that you go to the #github-invitations in the nf-core slack, you can be added to the organisation so the tests run.

@ehsanestaji
Copy link
Copy Markdown
Contributor Author

I requested access in the nf-core Slack #github-invitations channel. Once I’m added to the organisation, the CI should be able to run on this PR.

Copy link
Copy Markdown
Contributor

@nictru nictru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please incorporate the feedback Lukas and I provided in #11756 so we can make sure the zarr support is structured the same way for all affected modules

Comment on lines -15 to +18
tuple val(meta), path("*.h5ad") , emit: h5ad
tuple val(meta), path("*.h5ad") , optional: true, emit: h5ad
tuple val(meta), path("X_*.pkl"), emit: obsm
path "versions.yml" , emit: versions
path "versions.yml" , emit: versions, topic: versions
tuple val(meta), path("*.zarr") , optional: true, emit: zarr
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the same structure as in #11756

adata = sc.read_h5ad("${h5ad}")
input_file = "${h5ad}"
output_file = "${output_file}"
adata = ad.read_zarr(input_file) if input_file.endswith(".zarr") else sc.read_h5ad(input_file)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the logic structure consistent with #11756

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants