Skip to content

feat: ORF-level differential translation#189

Draft
pinin4fjords wants to merge 8 commits into
feat/166-orf-quantificationfrom
feat/168-orf-dte
Draft

feat: ORF-level differential translation#189
pinin4fjords wants to merge 8 commits into
feat/166-orf-quantificationfrom
feat/168-orf-dte

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

Summary

Closes the modernisation arc on differential translation by adding two complementary paths:

  1. Gene-level TE numerator re-aggregation. ORF_TO_GENE_CDS_COUNTS sums only canonical_cds-class ORFs from the catalogue back to gene level (via orf_to_gene.tsv + catalogue_tsv), replacing the plastid-derived gene-CDS Ribo-seq counts before REPLACE_RIBOSEQ_COUNTS_IN_MATRIX. Keeps the gene-level TE numerator clean of uORF / dORF dynamics.
  2. Per-ORF DTE. DTE_COUNTS_PREP joins the per-ORF Ribo-seq P-site counts (feat: ORF-level P-site quantification — replace gene-level counting with per-ORF counts #166) with the gene-level Salmon RNA-seq counts via orf_to_gene.tsv (novel intergenic ORFs without a host gene drop out). The matrix feeds the existing DESEQ2_DELTATE / ANOTA2SEQ_ANOTA2SEQRUN engines (aliased as *_ORF) at ORF resolution.

Changes

  • New local modules: orf_to_gene_cds_counts, dte_counts_prep.
  • Workflow integration: gated on --extended_orf_analysis true + catalogue exists + --skip_plastid false.
  • New param: --run_dotseq (placeholder; no-op until Add dotseq/dotseq modules#11742 lands the DOTSeq module).
  • CHANGELOG entry + params table.

Row-independence caveat

The per-ORF DTE path joins Ribo-seq ORF counts to a gene-level RNA-seq denominator. ORFs sharing a gene-level denominator are perfectly correlated after the join, so per-ORF test statistics underestimate uncertainty for sibling ORFs. Treat the per-ORF p-values as exploratory; the gene-level TE (Tier 1, with canonical-CDS-only aggregation) is the inference-grade output.

What's deferred

  • DOTSeq per-ORF DOU integration awaits Add dotseq/dotseq modules#11742 (a separate DOTSeq module PR is OPEN).
  • FILTER_COUNTS_CANONICAL (restrict gene-level DTE to canonical gene IDs) is on aggregation but layered in later; if you want it here, follow-up commit.

Stacked PR notes

Thirteenth and final in the stack splitting #174. Targets #188 (feat/166-orf-quantification).

Closes #168

🤖 Generated with Claude Code

(Tier 1) Gene-level TE numerator re-aggregation:
ORF_TO_GENE_CDS_COUNTS sums ONLY canonical_cds ORFs from the
catalogue (via orf_to_gene.tsv + catalogue_tsv classification) back
to gene level, replacing the plastid-derived gene-CDS counts before
REPLACE_RIBOSEQ_COUNTS_IN_MATRIX. Keeps the gene-level TE clean of
uORF / dORF dynamics.

(Tier 2) Per-ORF DTE:
DTE_COUNTS_PREP joins per-ORF Ribo-seq P-site counts with gene-level
Salmon RNA-seq counts via orf_to_gene.tsv. Feeds the existing
DESEQ2_DELTATE / ANOTA2SEQ_ANOTA2SEQRUN engines (aliased) at
ORF resolution.

Gated on --extended_orf_analysis + catalogue exists + --skip_plastid
false. Adds --run_dotseq placeholder (no-op until #11742 lands).

Row-independence caveat: ORFs sharing a gene-level Salmon denominator
are perfectly correlated after the join.
@nf-core-bot
Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

pinin4fjords and others added 7 commits May 22, 2026 15:45
… failing

The previous strict-overlap check rejected the common riboseq wiring,
where the secondary matrix is Salmon's gene-level all-sample quant fed
in for its RNA-seq columns but still carries the Ribo-seq columns
alongside. Primary's columns are authoritative for the primary role,
so drop the overlap from secondary and log it to stderr rather than
hard-erroring at runtime.

A degenerate-case guard remains: if secondary has zero columns left
after the drop, the script exits with a clear "no role-specific samples
left" message.

This unblocks the existing novel_gtf and stringtie_extended tests
(which were failing on 25.04.8 CI at DTE_COUNTS_PREP (allsamples)) as
well as the ORF-level dotseq / deltate / anota2seq paths.

[skip ci]

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Wire DOTSEQ_DOTSEQ_ORF through DTE_COUNTS_PREP at ORF resolution.
Drop the --run_dotseq placeholder; dotseq is now a third value for
--translational_efficiency_method.

Module installed from nf-core/modules#11742-pending (registered under
https://github.com/pinin4fjords/nf-core-modules.git so nf-core lint
doesn't hit an interactive prompt under CI's no-TTY shell).

Adds withName blocks for the ORF-level DTE chain plus
extra_orf_dte_args / extra_dotseq_args params, and brings
tests/dotseq.nf.test + snapshot.

[skip ci]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants