Skip to content

Bulk MultiQC reporting as a pipeline-installed Python module#450

Open
keiran-rowell-unsw wants to merge 22 commits into
nf-core:devfrom
Australian-Structural-Biology-Computing:multiqc_bulk_report
Open

Bulk MultiQC reporting as a pipeline-installed Python module#450
keiran-rowell-unsw wants to merge 22 commits into
nf-core:devfrom
Australian-Structural-Biology-Computing:multiqc_bulk_report

Conversation

@keiran-rowell-unsw
Copy link
Copy Markdown
Contributor

@keiran-rowell-unsw keiran-rowell-unsw commented Jan 28, 2026

Placeholder draft PR for implementing 'bulk' MultiQC reporting as per #439.

Recommendation was deploying as a setup.py module as its pipeline specific and won't work out-of-the-box on underlying tools (see MultiQC PR)

Still testing on a local UNSW HPC and then OoD-proteinfold deployment. Works fine on pre-computed input.

@keiran-rowell-unsw keiran-rowell-unsw changed the title Bulk MultiQC reporting as a module Bulk MultiQC reporting as a pipeline-installed Python module Jan 28, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 28, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 8d3bf30

+| ✅ 327 tests passed       |+
#| ❔   4 tests were ignored |#
#| ❔   1 tests had warnings |#
!| ❗  33 tests had warnings |!
Details

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • files_exist - File not found: conf/igenomes_ignored.config
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
  • schema_description - No description provided in schema for parameter: rosettafold2na_uniref30_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_bfd_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_pdb100_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_weights_link
  • schema_description - No description provided in schema for parameter: rfam_full_region_link
  • schema_description - No description provided in schema for parameter: rfam_cm_link
  • schema_description - No description provided in schema for parameter: rnacentral_rfam_annotations_link
  • schema_description - No description provided in schema for parameter: rnacentral_id_mapping_link
  • schema_description - No description provided in schema for parameter: rnacentral_sequences_link
  • schema_description - No description provided in schema for parameter: rosettafold2na_uniref30_path
  • schema_description - No description provided in schema for parameter: rosettafold2na_bfd_path
  • schema_description - No description provided in schema for parameter: rosettafold2na_pdb100_path
  • schema_description - No description provided in schema for parameter: rosettafold2na_weights_path
  • local_component_structure - post_processing.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_rosettafold_all_atom_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_rosettafold2na_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_alphafold3_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_colabfold_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - aria2_uncompress.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_esmfold_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_helixfold3_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_boltz_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_alphafold2_dbs.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure

❔ Tests ignored:

❔ Tests fixed:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2026-01-28 03:15:25

@keiran-rowell-unsw
Copy link
Copy Markdown
Contributor Author

pip installing will break in a container due to unwritable locations.
Quick fix is a venv which is at least portable. Creating a new container for each module would delay release.

Also since it's pipe-line specific I can't post-process with multiqc .
Will need alerting, for each module, lines like this:
ch_multiqc = ch_multiqc.mix(BOLTZ.out.multiqc_report)

Leave this for now because the refactor effort would fit into v2.1 with neater GENERATE_REPORT utils as well

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements “bulk” ProteinFold MultiQC reporting by packaging a pipeline-specific MultiQC plugin (Python module) and wiring the pipeline’s MULTIQC process to install/use it when generating reports.

Changes:

  • Add a Python package (multiqc_proteinfold) providing a MultiQC module that parses pipeline TSV metrics and generates general stats + pLDDT line plots.
  • Add setup.py with MultiQC entry-point registration for the plugin.
  • Update the nf-core multiqc module to install the local plugin at runtime / via conda environment config.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
setup.py Defines the plugin package + MultiQC entry point for module discovery.
multiqc_proteinfold/proteinfold.py Implements the MultiQC module parsing metrics and adding report sections.
multiqc_proteinfold/multiqc_config.yaml Adds MultiQC search pattern config for proteinfold TSVs and logo settings.
multiqc_proteinfold/init.py Exposes the module and package version.
modules/nf-core/multiqc/main.nf Installs the plugin before running multiqc.
modules/nf-core/multiqc/environment.yml Attempts to add pip-based installation of the local plugin into the conda env.
Comments suppressed due to low confidence (2)

modules/nf-core/multiqc/environment.yml:10

  • The conda environment.yml has invalid syntax for pip-installed dependencies (- pip followed by a nested list). Conda expects a pip: mapping (e.g., - pip: with a list under it). As written, environment creation will fail.
  - bioconda::multiqc=1.32
  - pip
    - ${projectDir} # Install proteinfold_multiqc as a local plugin

modules/nf-core/multiqc/environment.yml:10

  • ${projectDir} in environment.yml will not be interpolated by conda/mamba, so it won’t resolve to the pipeline path. If you need to install the local plugin, do it in the process script (or bake it into the container/image) rather than relying on a conda env file variable.
  - pip
    - ${projectDir} # Install proteinfold_multiqc as a local plugin


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread multiqc_proteinfold/proteinfold.py Outdated
Comment thread multiqc_proteinfold/proteinfold.py Outdated
Comment thread multiqc_proteinfold/proteinfold.py Outdated
Comment thread multiqc_proteinfold/proteinfold.py Outdated
Comment thread multiqc_proteinfold/proteinfold.py Outdated
Comment thread modules/nf-core/multiqc/main.nf Outdated
Comment thread multiqc_proteinfold/multiqc_config.yaml Outdated
Comment thread setup.py
Comment thread multiqc_proteinfold/proteinfold.py Outdated
keiran-rowell-unsw and others added 2 commits March 30, 2026 15:08
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@nf-core-bot
Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@keiran-rowell-unsw keiran-rowell-unsw marked this pull request as ready for review March 30, 2026 05:25
@keiran-rowell-unsw keiran-rowell-unsw self-assigned this Mar 30, 2026
@keiran-rowell-unsw keiran-rowell-unsw added the enhancement Improvement for existing functionality label Mar 30, 2026
@keiran-rowell-unsw keiran-rowell-unsw added this to the 2.1.0 milestone Mar 30, 2026
@keiran-rowell-unsw
Copy link
Copy Markdown
Contributor Author

@JoseEspinosa the CI seems to be fail on the bioflow metromap generation (#426) but I don't understand this automated metromap system

@keiran-rowell-unsw
Copy link
Copy Markdown
Contributor Author

The proteinfold MultiQC charting from the .tsvs should be generally valuable, but suffers a bit from a GENERATE_REPORT that hasn't been made more robust (was critical for correct value parsing, but per #484 I can revive my code in the Aus repo) and would all benefit from non-hardcoded non-position value access for grabbing metrics (#373)

@JoseEspinosa
Copy link
Copy Markdown
Member

@JoseEspinosa the CI seems to be fail on the bioflow metromap generation (#426) but I don't understand this automated metromap system

Just ignore it for the moment

keiran-rowell-unsw added a commit to Australian-Structural-Biology-Computing/proteinfold that referenced this pull request Mar 31, 2026
@jscgh jscgh modified the milestones: 2.1.0, 3.0.0 Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement for existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants