+
Skip to content

WIP: Refactor pipeline downloads command #3634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 68 commits into
base: dev
Choose a base branch
from

Conversation

ErikDanielsson
Copy link
Contributor

@ErikDanielsson ErikDanielsson commented Jun 23, 2025

This is a draft PR for refactoring nf-core pipelines download for readability and to use the nextflow inspect command for container detection. It builds upon the excellent work of @MatthiasZepper, @muffato (#3509), @JulianFlesch (refactor/pipeline-download), and @nikhil (#3517).

Integrated changes from contributors:

  • Base container fetching primarily on nextflow inspect
  • Add SingularityFetcher class for fetching singularity images
  • Add DockerFetcher class for fetching docker images

Added changes

  • Refactor download.py
    • Split out the WorkflowRepo class. I have not made significant changes to the code (yet)
    • (Re)move regex parsing of container strings. This is still present but moved to download/utils.py. I will remove it entirely once we have tested the nextflow inspect command properly.
    • Add checks and nice reporting for required Nextflow version for nextflow inspect.
    • Add checks for weird container strings from nextflow inspect: The only previously supported case where nextflow inspect fails is when there is a variable in the string which is not currently available. This was used in the star_align module of rnaseq 3.7 (which is old and cannot either way be used with nextflow inspect since it requires input parameters). The output from the nextflow inspect will then contain a null. Perhaps nextflow inspect should issue a warning here, rather than require that we capture this downstream?
  • Refactor SingularityFetcher and DockerFetcher: I've created a superclass ContainerFetcher with a coherent interface and some code sharing.
  • Change concurrency model for DockerFetcher. See below.
  • Changed --parallel-downloads to parallel to better reflect the processing of both docker and singularity images. See docker concurrency below.

Discussion points related to this

  • I've tested quite a few pipelines, and nextflow inspect works nicely overall. However, some older pipelines require that we specify input parameters, which we then also would have to do for nextflow inspect. I suspect that this change happened when the lib/ folder was removed, is this correct? It is in principle possible to pass (dummy or actual) input parameters with the -params-file flag, but I not sure if it is desired and necessary for current pipelines.

Left to do

  • Remove regex code and corresponding tests
  • Add more tests... as always

Tests

The nextflow inspect command is required to run on a full pipeline: the command uses a Nextflow file as the entry point by default and then checks what modules are imported in any workflows or subworkflows that are used.
This means that out current tests which are based on stripped modules do not work. I have therefore added a pipeline skeleton in the test data "mock_pipeline_containers" with the following features

  • The tested container URI that nextflow inspect are located in modules/local/ -- I have structured it so that it mirrors the typical structure of local modules in an nf-core pipeline. The tested URIs are taken from the containers present in mock_module_containers
    • The subdirectory passing contains modules where the container strings are correctly captured by nextflow inspect
    • The subdirectory failing contains modules where the container strings are not capture correctly by nextflow inspect. It contains a single module at the moment, mock_dsl2_variable.nf, where the container string is not correctly resolved, presumably since nextflow inspect does not run any code in the script section, meaning that container_id is resolved to null. This is added to ensure that we issue nice error messages for weird container strings.
  • Tested are performed by running nextflow inspect on either of the entrypoints main_passing_test.nf or main_failing_test.nf.
  • Correct behaviour is tested in tests/download/DownloadTest.test_containers_pipeline_<container> where <container> is either singularity or docker. In these tests the output from nextflow inspect . -profile <container> is compared to the true list of containers which is kept in mock_pipeline_containers/per_profile_output as a JSON file.
  • I have left much of the Nextflow code added by the nf-core create command -- we could definitely make this folder leaner if that is desired. Any pointers related to this are very welcome!

Once we are ready to remove the legacy regex code we will be able to remove the old test data folder for module container definitions mock_module_containers.

Discussion points related to this:

  • Should we support container definitions in configs? While nextflow inspect will capture them if they are configured correctly I am not sure if it is desired behavior.
    • Since we will likely make the container definitions more strict (Move to seqera containers 1 and Move to seqera containers 2), I would presume this is the case, but then we need to decide on backwards compatibility.
    • If we want to support it, then we need to port the tests to ensure that containers are capture correctly

Downloading docker images

Support for docker containers is added in this PR. It works slightly different than support for singularity containers:

  • Docker images are always pulled from a registry with docker image pull
  • Images that have been pulled (or otherwise available locally) can be saved to a tar archive with docker image save to be transferred to the offline machine

Concurrency
Parallel pulls do not make sense since there might be layer dependence between different images. However, parallel saves does make sense, since the images are only read when making the tar archive. In the code I use a single ThreadPool where the pulling of images is done sequentially, and the remaining threads are used for converting the images to tar. See this is done in the following lines.

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

muffato and others added 30 commits March 25, 2025 14:54
@ErikDanielsson ErikDanielsson force-pushed the download-refactor-again branch from b458c8a to 9913e56 Compare June 25, 2025 11:17
@ErikDanielsson ErikDanielsson force-pushed the download-refactor-again branch from e7c2da7 to c65d09f Compare June 25, 2025 11:50
Copy link

codecov bot commented Jun 25, 2025

Codecov Report

Attention: Patch coverage is 61.57480% with 488 lines in your changes missing coverage. Please review.

Project coverage is 76.75%. Comparing base (28b76a4) to head (05adb4d).

Files with missing lines Patch % Lines
nf_core/pipelines/download/download.py 65.96% 161 Missing ⚠️
nf_core/pipelines/download/docker.py 20.12% 123 Missing ⚠️
nf_core/pipelines/download/singularity.py 42.70% 110 Missing ⚠️
nf_core/pipelines/download/workflow_repo.py 67.58% 47 Missing ⚠️
nf_core/pipelines/download/container_fetcher.py 73.62% 24 Missing ⚠️
nf_core/pipelines/download/utils.py 89.20% 23 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mirpedrol
Copy link
Member

Current failing tests are not dure to this PR, I am fixing then in a separate one https://github.com/nf-core/tools/actions/runs/16113859276/job/45463246665?pr=3634

@ErikDanielsson
Copy link
Contributor Author

@mirpedrol Thanks for letting me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
download nf-core download WIP Work in progress
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载