Skip to content

Add rpbp test data: per-stage intermediates for nf-core/modules#11695#2064

Draft
pinin4fjords wants to merge 1 commit into
nf-core:modulesfrom
pinin4fjords:rpbp-test-data
Draft

Add rpbp test data: per-stage intermediates for nf-core/modules#11695#2064
pinin4fjords wants to merge 1 commit into
nf-core:modulesfrom
pinin4fjords:rpbp-test-data

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

Summary

Test fixture set for nf-core/modules#11695 (the eight rpbp/* modules + fasta_gtf_bam_rpbp subworkflow) under data/genomics/homo_sapiens/riboseq_expression/rpbp/.

Previously each rpbp module test in nf-core/modules chained six upstream stages (PREPAREGENOME -> EXTRACTMETAGENEPROFILES -> ESTIMATEMETAGENEBAYESFACTORS -> SELECTPERIODICOFFSETS -> GETPERIODICLENGTHSOFFSETS -> EXTRACTORFPROFILES -> ESTIMATEORFBAYESFACTORS) in its setup just to produce its single immediate-upstream input. That was slow (several minutes per module on CI) and made module tests harder to debug. With these fixtures every module test fetches one static file and runs in well under a minute. The full chain is still exercised once, in the subworkflow integration test.

Files

File Size Notes
reference.annotated.bed.gz 234 KB Transcript-level annotated BED from rpbp/preparegenome
reference.orfs-genomic.annotated.bed.gz 1.49 MB Genomic-coordinate ORF BED from rpbp/preparegenome
reference.orfs-exons.annotated.bed.gz 1.33 MB Exon-coordinate ORF BED from rpbp/preparegenome
SRX11780888_chr20.metagene-profile.csv.gz 53 KB from rpbp/extractmetageneprofiles
SRX11780888_chr20.metagene-periodicity-bayes-factors.csv.gz 19 KB from rpbp/estimatemetagenebayesfactors
SRX11780888_chr20.periodic-offsets.csv.gz <1 KB from rpbp/selectperiodicoffsets
SRX11780888_chr20.periodic_lengths_offsets.tsv 68 B from rpbp/getperiodiclengthsoffsets (lenient '10 1 None 0.0' thresholds; chr20 alone does not pass rpbp defaults)
SRX11780888_chr20.profiles.mtx.gz 331 KB from rpbp/extractorfprofiles
SRX11780888_chr20.bayes-factors.bed.gz 602 KB from rpbp/estimateorfbayesfactors
README.md <4 KB Derivation recipe + per-file size/source.

Total: 4.1 MB across 10 files. Every file <4 MiB.

Source

Single nf-core subworkflows test fasta_gtf_bam_rpbp run on the existing chr20 cohort:

  • BAM: aligned_reads/SRX11780888_chr20.bam (+ .bai)
  • FASTA: Homo_sapiens.GRCh38.dna.chromosome.20.fa.gz
  • GTF: Homo_sapiens.GRCh38.111_chr20.gtf

(all from this same riboseq_expression/ folder).

rpbp version: 4.0.1 (Wave container community.wave.seqera.io/library/rpbp:4.0.1--71297b462026e13b).

Test plan

  • modules/nf-core/rpbp/*/tests/main.nf.test in nf-core/modules#11695 consume these fixtures via raw.githubusercontent.com URLs pinned to this branch; URLs to be updated to modules after this PR merges
  • nf-core subworkflows test fasta_gtf_bam_rpbp (the integration test) still runs the full chain end-to-end

The rpbp module tests in nf-core/modules need their immediate-upstream
inputs as static fixtures rather than chaining the six upstream stages
of the rpbp pipeline (PREPAREGENOME -> EXTRACTMETAGENEPROFILES ->
ESTIMATEMETAGENEBAYESFACTORS -> SELECTPERIODICOFFSETS ->
GETPERIODICLENGTHSOFFSETS -> EXTRACTORFPROFILES -> ESTIMATEORFBAYESFACTORS).
With these fixtures, each module test fetches one file rather than
re-running the chain, cutting per-module test time from minutes to
seconds.

Fixtures derived from a single fasta_gtf_bam_rpbp subworkflow run on
the existing chr20 BAM/FASTA/GTF in this folder. All files <4 MiB,
~4.1 MB total. See data/.../rpbp/README.md for full derivation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pinin4fjords added a commit to nf-core/modules that referenced this pull request May 20, 2026
Two interleaved changes:

1. Split the periodic-length filter (`get_periodic_lengths_and_offsets`)
   out of `rpbp/extractorfprofiles` into its own
   `rpbp/getperiodiclengthsoffsets` module. `extractorfprofiles` now
   takes a `lengths_offsets` TSV input rather than computing the filter
   inline. The threshold args move from `ext.args2` on
   `extractorfprofiles` to `ext.args` on `getperiodiclengthsoffsets`.
   `fasta_gtf_bam_rpbp` chains the new module between
   `selectperiodicoffsets` and `extractorfprofiles`.

2. Module-level tests now fetch their single immediate-upstream input
   from `nf-core/test-datasets:modules` (under
   `data/genomics/homo_sapiens/riboseq_expression/rpbp/`, added in
   nf-core/test-datasets#2064) instead of chaining six upstream stages
   in setup. Each module test runs in well under a minute. The
   subworkflow integration test still runs the full chain end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant