Skip to content

feat: support task.ext.args in sentieon/gvcftyper#11763

Open
nicorap wants to merge 2 commits into
masterfrom
feat/sentieon-gvcftyper-ext-args
Open

feat: support task.ext.args in sentieon/gvcftyper#11763
nicorap wants to merge 2 commits into
masterfrom
feat/sentieon-gvcftyper-ext-args

Conversation

@nicorap
Copy link
Copy Markdown
Contributor

@nicorap nicorap commented May 26, 2026

Description

Add task.ext.args support to SENTIEON_GVCFTYPER, matching the convention already used by sibling sentieon modules (haplotyper, varcal, applyvarcal,
dnascope, applyvarcal, etc.).

+ def args = task.ext.args ?: ''
  ...
  sentieon driver \
      -r ${fasta} \
      ${interval_command} \
      --algo GVCFtyper \
+     ${args} \
      ${gvcfs_input} \
      ${dbsnp_cmd} \
      ${prefix}.vcf.gz

${args} is injected after --algo GVCFtyper, so callers pass algorithm-level options (--emit_mode, --call_conf,
--allow-old-rms-mapping-quality-annotation-data, etc.). Driver-level flags (-t, --interval) remain managed by the module since they relate to inputs the
module already owns.

Reason for PR

nf-core/sarek's joint-germline path needs to call GVCFtyper in two distinct modes for very large cohorts:

  • SHARD (per interval, per sample-batch): --emit_mode all_samples to produce batch-aggregated intermediates
  • REDUCE (per interval, over batch intermediates): default emit mode for the final joint VCF

Without ext.args support, sarek currently has to ship a local fork of the module (modules/local/sentieon/gvcftyper/) that's identical to the upstream module
except for these two lines. This PR removes that need.

Tests

No new tests added — the change is a no-op when task.ext.args is unset (the default), which is what every existing test exercises. They continue to pass
unchanged.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests! — existing tests cover the empty-args path; non-empty args is pass-through
  • If you've added a new tool - have you followed the module conventions in the contribution
    docs
    — N/A, existing module
  • If necessary, include test data in your PR. — no new data
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions — unchanged
  • Follow the naming conventions. — unchanged
  • Follow the parameters requirements. — unchanged
  • Follow the input/output options guidelines. — unchanged
  • Add a resource labellabel 'process_high' already present
  • Use BioConda and BioContainers if possible to fulfil software requirements. — already configured
  • For modules:
    • nf-core modules test sentieon/gvcftyper --profile docker
    • nf-core modules test sentieon/gvcftyper --profile singularity
    • nf-core modules test sentieon/gvcftyper --profile conda

Relying on the CI matrix (license-server credentials only available from upstream branches).

Mirrors the convention used by sibling sentieon modules (haplotyper,
varcal, applyvarcal). The injection point is after `--algo GVCFtyper`
so callers pass algorithm-level options (e.g. `--emit_mode all_samples`,
`--call_conf`, `--allow-old-rms-mapping-quality-annotation-data`).

Use case: nf-core/sarek's joint-germline workflow can now configure a
two-pass sharded GVCFtyper (SHARD with `--emit_mode all_samples`,
REDUCE with default emit) for cohorts that exceed single-pass RAM
limits, without forking the module.
@SPPearce
Copy link
Copy Markdown
Contributor

The GitHub actions seem to be down at the moment

@asp8200
Copy link
Copy Markdown
Contributor

asp8200 commented May 26, 2026

I see this thing with the --emit_mode was handled a bit differently in the other sentieon modules, for instance, here.

Over in Sarek, we then have the option --sentieon_haplotyper_emit_mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants