Skip to content

feat: cross-sample ORF catalogue#187

Draft
pinin4fjords wants to merge 7 commits into
feat/169-rpbpfrom
feat/167-orf-catalogue
Draft

feat: cross-sample ORF catalogue#187
pinin4fjords wants to merge 7 commits into
feat/169-rpbpfrom
feat/167-orf-catalogue

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

Summary

Builds a cohort-level ORF catalogue across all enabled callers (Ribo-TISH, RiboCode, Ribotricer, Rp-Bp, PRICE) by normalising each per-sample output to a unified BED12 + sidecar TSV, then merging with a class-aware collapse:

  • Transcript-ID grouping for annotated multi-exon CDS.
  • 80% reciprocal overlap for single-exon novel intergenic and smORFs ($\leq$100 aa).

Emits orf_catalogue.{bed12,tsv}, orf_to_gene.tsv, and an AA FASTA under <outdir>/orf_catalogue/, plus a MultiQC custom-content per-class count table.

Changes

  • Install upstream subworkflow orftable_fasta_gtf_buildorfcatalogue (feat(custom): orfnormalise + orfmerge modules + orftable_fasta_gtf_buildorfcatalogue subworkflow modules#11740) and its module dependencies (custom/orfnormalise, custom/orfmerge, bedtools/getfasta, seqkit/translate).
  • Per-caller prediction channels (ch_ribotish_predictions, ch_ribocode_predictions, ch_ribotricer_predictions, ch_rpbp_predictions, ch_price_predictions) default to Channel.empty() and are overridden inside each caller's if-block.
  • Gating: catalogue invocation requires --extended_orf_analysis true and at least one ORF caller enabled.
  • nextflow.config includes the subworkflow's bundled config for BEDTOOLS_GETFASTA / SEQKIT_TRANSLATE defaults.
  • conf/modules.config: withName blocks for the three subworkflow processes, publishing under <outdir>/orf_catalogue/.

🚨 Upstream dependency (blocker)

Depends on nf-core/modules#11740 (feat(custom): orfnormalise + orfmerge modules + orftable_fasta_gtf_buildorfcatalogue subworkflow), currently OPEN. modules.json is pinned to the PR branch SHA. Before this PR can leave draft:

Stacked PR notes

Eleventh in the stack splitting #174. Targets #186 (feat/169-rpbp).

Closes #167

🤖 Generated with Claude Code

Gather per-sample, per-caller ORF predictions (Ribo-TISH, RiboCode,
Ribotricer, Rp-Bp, PRICE), normalise each to a unified BED12 + sidecar
TSV, then merge into a cohort-level catalogue with a class-aware
strategy (transcript-ID grouping for annotated multi-exon CDS, 80%
reciprocal overlap for single-exon novel intergenic and smORFs).
Emits orf_catalogue.{bed12,tsv}, orf_to_gene.tsv, and an AA FASTA
under <outdir>/orf_catalogue/, plus a MultiQC custom-content per-class
count table.

Implementation uses the upstream orftable_fasta_gtf_buildorfcatalogue
subworkflow (nf-core/modules#11740): CUSTOM_ORFNORMALISE per caller,
CUSTOM_ORFMERGE for cohort-level merge, BEDTOOLS_GETFASTA +
SEQKIT_TRANSLATE to produce the catalogue AA FASTA.

Per-caller prediction channels (ch_*_predictions) default to
Channel.empty() and are overridden inside each caller's if-block,
gating the catalogue invocation on extended_orf_active +
at-least-one-caller.

modules.json currently pins custom/orfnormalise, custom/orfmerge,
and the orftable_fasta_gtf_buildorfcatalogue subworkflow to
nf-core/modules#11740 (branch custom-orf-catalogue, sha 6597190c).
Once #11740 merges, run nf-core modules update / subworkflows update
to swap pins to master.
@nf-core-bot
Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants