Add rpbp test data: per-stage intermediates for nf-core/modules#11695 by pinin4fjords · Pull Request #2064 · nf-core/test-datasets

pinin4fjords · 2026-05-20T10:31:07Z

Summary

Test fixture set for nf-core/modules#11695 (the eight rpbp/* modules + fasta_gtf_bam_rpbp subworkflow) under data/genomics/homo_sapiens/riboseq_expression/rpbp/.

Previously each rpbp module test in nf-core/modules chained six upstream stages (PREPAREGENOME -> EXTRACTMETAGENEPROFILES -> ESTIMATEMETAGENEBAYESFACTORS -> SELECTPERIODICOFFSETS -> GETPERIODICLENGTHSOFFSETS -> EXTRACTORFPROFILES -> ESTIMATEORFBAYESFACTORS) in its setup just to produce its single immediate-upstream input. That was slow (several minutes per module on CI) and made module tests harder to debug. With these fixtures every module test fetches one static file and runs in well under a minute. The full chain is still exercised once, in the subworkflow integration test.

Files

File	Size	Notes
`reference.annotated.bed.gz`	234 KB	Transcript-level annotated BED from `rpbp/preparegenome`
`reference.orfs-genomic.annotated.bed.gz`	1.49 MB	Genomic-coordinate ORF BED from `rpbp/preparegenome`
`reference.orfs-exons.annotated.bed.gz`	1.33 MB	Exon-coordinate ORF BED from `rpbp/preparegenome`
`SRX11780888_chr20.metagene-profile.csv.gz`	53 KB	from `rpbp/extractmetageneprofiles`
`SRX11780888_chr20.metagene-periodicity-bayes-factors.csv.gz`	19 KB	from `rpbp/estimatemetagenebayesfactors`
`SRX11780888_chr20.periodic-offsets.csv.gz`	<1 KB	from `rpbp/selectperiodicoffsets`
`SRX11780888_chr20.periodic_lengths_offsets.tsv`	68 B	from `rpbp/getperiodiclengthsoffsets` (lenient `'10 1 None 0.0'` thresholds; chr20 alone does not pass rpbp defaults)
`SRX11780888_chr20.profiles.mtx.gz`	331 KB	from `rpbp/extractorfprofiles`
`SRX11780888_chr20.bayes-factors.bed.gz`	602 KB	from `rpbp/estimateorfbayesfactors`
`README.md`	<4 KB	Derivation recipe + per-file size/source.

Total: 4.1 MB across 10 files. Every file <4 MiB.

Source

Single nf-core subworkflows test fasta_gtf_bam_rpbp run on the existing chr20 cohort:

BAM: aligned_reads/SRX11780888_chr20.bam (+ .bai)
FASTA: Homo_sapiens.GRCh38.dna.chromosome.20.fa.gz
GTF: Homo_sapiens.GRCh38.111_chr20.gtf

(all from this same riboseq_expression/ folder).

rpbp version: 4.0.1 (Wave container community.wave.seqera.io/library/rpbp:4.0.1--71297b462026e13b).

Test plan

modules/nf-core/rpbp/*/tests/main.nf.test in nf-core/modules#11695 consume these fixtures via raw.githubusercontent.com URLs pinned to this branch; URLs to be updated to modules after this PR merges
nf-core subworkflows test fasta_gtf_bam_rpbp (the integration test) still runs the full chain end-to-end

The rpbp module tests in nf-core/modules need their immediate-upstream inputs as static fixtures rather than chaining the six upstream stages of the rpbp pipeline (PREPAREGENOME -> EXTRACTMETAGENEPROFILES -> ESTIMATEMETAGENEBAYESFACTORS -> SELECTPERIODICOFFSETS -> GETPERIODICLENGTHSOFFSETS -> EXTRACTORFPROFILES -> ESTIMATEORFBAYESFACTORS). With these fixtures, each module test fetches one file rather than re-running the chain, cutting per-module test time from minutes to seconds. Fixtures derived from a single fasta_gtf_bam_rpbp subworkflow run on the existing chr20 BAM/FASTA/GTF in this folder. All files <4 MiB, ~4.1 MB total. See data/.../rpbp/README.md for full derivation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two interleaved changes: 1. Split the periodic-length filter (`get_periodic_lengths_and_offsets`) out of `rpbp/extractorfprofiles` into its own `rpbp/getperiodiclengthsoffsets` module. `extractorfprofiles` now takes a `lengths_offsets` TSV input rather than computing the filter inline. The threshold args move from `ext.args2` on `extractorfprofiles` to `ext.args` on `getperiodiclengthsoffsets`. `fasta_gtf_bam_rpbp` chains the new module between `selectperiodicoffsets` and `extractorfprofiles`. 2. Module-level tests now fetch their single immediate-upstream input from `nf-core/test-datasets:modules` (under `data/genomics/homo_sapiens/riboseq_expression/rpbp/`, added in nf-core/test-datasets#2064) instead of chaining six upstream stages in setup. Each module test runs in well under a minute. The subworkflow integration test still runs the full chain end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pinin4fjords mentioned this pull request May 20, 2026

feat(rpbp): add 8 rpbp modules + fasta_gtf_bam_rpbp subworkflow nf-core/modules#11695

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rpbp test data: per-stage intermediates for nf-core/modules#11695#2064

Add rpbp test data: per-stage intermediates for nf-core/modules#11695#2064
pinin4fjords wants to merge 1 commit into
nf-core:modulesfrom
pinin4fjords:rpbp-test-data

pinin4fjords commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pinin4fjords commented May 20, 2026

Summary

Files

Source

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant