Add rpbp test data: per-stage intermediates for nf-core/modules#11695#2064
Draft
pinin4fjords wants to merge 1 commit into
Draft
Add rpbp test data: per-stage intermediates for nf-core/modules#11695#2064pinin4fjords wants to merge 1 commit into
pinin4fjords wants to merge 1 commit into
Conversation
The rpbp module tests in nf-core/modules need their immediate-upstream inputs as static fixtures rather than chaining the six upstream stages of the rpbp pipeline (PREPAREGENOME -> EXTRACTMETAGENEPROFILES -> ESTIMATEMETAGENEBAYESFACTORS -> SELECTPERIODICOFFSETS -> GETPERIODICLENGTHSOFFSETS -> EXTRACTORFPROFILES -> ESTIMATEORFBAYESFACTORS). With these fixtures, each module test fetches one file rather than re-running the chain, cutting per-module test time from minutes to seconds. Fixtures derived from a single fasta_gtf_bam_rpbp subworkflow run on the existing chr20 BAM/FASTA/GTF in this folder. All files <4 MiB, ~4.1 MB total. See data/.../rpbp/README.md for full derivation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pinin4fjords
added a commit
to nf-core/modules
that referenced
this pull request
May 20, 2026
Two interleaved changes: 1. Split the periodic-length filter (`get_periodic_lengths_and_offsets`) out of `rpbp/extractorfprofiles` into its own `rpbp/getperiodiclengthsoffsets` module. `extractorfprofiles` now takes a `lengths_offsets` TSV input rather than computing the filter inline. The threshold args move from `ext.args2` on `extractorfprofiles` to `ext.args` on `getperiodiclengthsoffsets`. `fasta_gtf_bam_rpbp` chains the new module between `selectperiodicoffsets` and `extractorfprofiles`. 2. Module-level tests now fetch their single immediate-upstream input from `nf-core/test-datasets:modules` (under `data/genomics/homo_sapiens/riboseq_expression/rpbp/`, added in nf-core/test-datasets#2064) instead of chaining six upstream stages in setup. Each module test runs in well under a minute. The subworkflow integration test still runs the full chain end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test fixture set for nf-core/modules#11695 (the eight
rpbp/*modules +fasta_gtf_bam_rpbpsubworkflow) underdata/genomics/homo_sapiens/riboseq_expression/rpbp/.Previously each rpbp module test in nf-core/modules chained six upstream stages (
PREPAREGENOME -> EXTRACTMETAGENEPROFILES -> ESTIMATEMETAGENEBAYESFACTORS -> SELECTPERIODICOFFSETS -> GETPERIODICLENGTHSOFFSETS -> EXTRACTORFPROFILES -> ESTIMATEORFBAYESFACTORS) in its setup just to produce its single immediate-upstream input. That was slow (several minutes per module on CI) and made module tests harder to debug. With these fixtures every module test fetches one static file and runs in well under a minute. The full chain is still exercised once, in the subworkflow integration test.Files
reference.annotated.bed.gzrpbp/preparegenomereference.orfs-genomic.annotated.bed.gzrpbp/preparegenomereference.orfs-exons.annotated.bed.gzrpbp/preparegenomeSRX11780888_chr20.metagene-profile.csv.gzrpbp/extractmetageneprofilesSRX11780888_chr20.metagene-periodicity-bayes-factors.csv.gzrpbp/estimatemetagenebayesfactorsSRX11780888_chr20.periodic-offsets.csv.gzrpbp/selectperiodicoffsetsSRX11780888_chr20.periodic_lengths_offsets.tsvrpbp/getperiodiclengthsoffsets(lenient'10 1 None 0.0'thresholds; chr20 alone does not pass rpbp defaults)SRX11780888_chr20.profiles.mtx.gzrpbp/extractorfprofilesSRX11780888_chr20.bayes-factors.bed.gzrpbp/estimateorfbayesfactorsREADME.mdTotal: 4.1 MB across 10 files. Every file <4 MiB.
Source
Single
nf-core subworkflows test fasta_gtf_bam_rpbprun on the existing chr20 cohort:aligned_reads/SRX11780888_chr20.bam(+.bai)Homo_sapiens.GRCh38.dna.chromosome.20.fa.gzHomo_sapiens.GRCh38.111_chr20.gtf(all from this same
riboseq_expression/folder).rpbp version:
4.0.1(Wave containercommunity.wave.seqera.io/library/rpbp:4.0.1--71297b462026e13b).Test plan
modules/nf-core/rpbp/*/tests/main.nf.testin nf-core/modules#11695 consume these fixtures viaraw.githubusercontent.comURLs pinned to this branch; URLs to be updated tomodulesafter this PR mergesnf-core subworkflows test fasta_gtf_bam_rpbp(the integration test) still runs the full chain end-to-end