-
Notifications
You must be signed in to change notification settings - Fork 206
WIP: Refactor pipeline downloads command #3634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
WIP: Refactor pipeline downloads command #3634
Conversation
…environment variables
…have to be aware of the TaskID
…uarantee the task gets removed
b458c8a
to
9913e56
Compare
e7c2da7
to
c65d09f
Compare
Codecov ReportAttention: Patch coverage is
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Current failing tests are not dure to this PR, I am fixing then in a separate one https://github.com/nf-core/tools/actions/runs/16113859276/job/45463246665?pr=3634 |
@mirpedrol Thanks for letting me know! |
… for subclassing)
This is a draft PR for refactoring
nf-core pipelines download
for readability and to use thenextflow inspect
command for container detection. It builds upon the excellent work of @MatthiasZepper, @muffato (#3509), @JulianFlesch (refactor/pipeline-download), and @nikhil (#3517).Integrated changes from contributors:
nextflow inspect
SingularityFetcher
class for fetching singularity imagesDockerFetcher
class for fetching docker imagesAdded changes
download.py
WorkflowRepo
class. I have not made significant changes to the code (yet)download/utils.py
. I will remove it entirely once we have tested thenextflow inspect
command properly.nextflow inspect
.nextflow inspect
: The only previously supported case wherenextflow inspect
fails is when there is a variable in the string which is not currently available. This was used in thestar_align
module ofrnaseq 3.7
(which is old and cannot either way be used withnextflow inspect
since it requires input parameters). The output from thenextflow inspect
will then contain anull
. Perhapsnextflow inspect
should issue a warning here, rather than require that we capture this downstream?SingularityFetcher
andDockerFetcher
: I've created a superclassContainerFetcher
with a coherent interface and some code sharing.DockerFetcher
. See below.--parallel-downloads
toparallel
to better reflect the processing of bothdocker
andsingularity
images. See docker concurrency below.Discussion points related to this
nextflow inspect
works nicely overall. However, some older pipelines require that we specify input parameters, which we then also would have to do fornextflow inspect
. I suspect that this change happened when thelib/
folder was removed, is this correct? It is in principle possible to pass (dummy or actual) input parameters with the-params-file
flag, but I not sure if it is desired and necessary for current pipelines.Left to do
Tests
The
nextflow inspect
command is required to run on a full pipeline: the command uses a Nextflow file as the entry point by default and then checks what modules are imported in any workflows or subworkflows that are used.This means that out current tests which are based on stripped modules do not work. I have therefore added a pipeline skeleton in the test data "
mock_pipeline_containers
" with the following featuresnextflow inspect
are located inmodules/local/
-- I have structured it so that it mirrors the typical structure of local modules in an nf-core pipeline. The tested URIs are taken from the containers present inmock_module_containers
passing
contains modules where the container strings are correctly captured bynextflow inspect
failing
contains modules where the container strings are not capture correctly bynextflow inspect
. It contains a single module at the moment,mock_dsl2_variable.nf
, where the container string is not correctly resolved, presumably sincenextflow inspect
does not run any code in thescript
section, meaning thatcontainer_id
is resolved tonull
. This is added to ensure that we issue nice error messages for weird container strings.nextflow inspect
on either of the entrypointsmain_passing_test.nf
ormain_failing_test.nf
.tests/download/DownloadTest.test_containers_pipeline_<container>
where<container>
is eithersingularity
ordocker
. In these tests the output fromnextflow inspect . -profile <container>
is compared to the true list of containers which is kept inmock_pipeline_containers/per_profile_output
as a JSON file.nf-core create
command -- we could definitely make this folder leaner if that is desired. Any pointers related to this are very welcome!Once we are ready to remove the legacy regex code we will be able to remove the old test data folder for module container definitions
mock_module_containers
.Discussion points related to this:
nextflow inspect
will capture them if they are configured correctly I am not sure if it is desired behavior.Downloading docker images
Support for
docker
containers is added in this PR. It works slightly different than support forsingularity
containers:docker image pull
tar
archive withdocker image save
to be transferred to the offline machineConcurrency
Parallel pulls do not make sense since there might be layer dependence between different images. However, parallel saves does make sense, since the images are only read when making the tar archive. In the code I use a single
ThreadPool
where the pulling of images is done sequentially, and the remaining threads are used for converting the images totar
. See this is done in the following lines.PR checklist
CHANGELOG.md
is updateddocs
is updated