+
Skip to content

Template: Resume function can lead to inconsistencies in software version reporting #3653

Open
@JohannesKersting

Description

@JohannesKersting

Description of the bug

The current nf-core template, software versions are collected without disabling the cache function of .collectFile():

    // Collate and save software versions
    softwareVersionsToYAML(ch_versions)
        .collectFile(
            storeDir: "${params.outdir}/pipeline_info",
            name: 'nf_core_'  +  'diseasemodulediscovery_software_'  + 'mqc_'  + 'versions.yml',
            sort: true,
            newLine: true
        ).set { ch_collated_versions }

This can lead to inconsistencies (missing/additional processes listed) in combination with multiple pipeline runs using the Nextflow resume function.

A fix would be to disable the collectFile caching:

    // Collate and save software versions
    softwareVersionsToYAML(ch_versions)
        .collectFile(
            storeDir: "${params.outdir}/pipeline_info",
            name: 'nf_core_'  +  'diseasemodulediscovery_software_'  + 'mqc_'  + 'versions.yml',
            sort: true,
            newLine: true,
            cache: false
        ).set { ch_collated_versions }

Or to not provide a storeDir, but cache the file in the work directory instead.
Something similar can happen to all files created using collectFile, which are stored outside of the work directory with caching enabled.

Command used and terminal output

# run pipeline (first run)
nextflow run diseasemodulediscovery -r dev -profile test,docker --outdir results

# delete result directory
rm -r results

# run pipeline again with resume, but skip a step (second run)
nextflow run diseasemodulediscovery -r dev -profile test,docker --outdir results --skip_gprofiler -resume

# run full pipeline again with resume (third run)
nextflow run diseasemodulediscovery -r dev -profile test,docker --outdir results -resume

# the third run uses the caching of gprofiler from the first run (cached in work), but the caching of the software versions from the second run (cached in results)
# As a result, the software versions of gprofiler are not listed in the multiqc report of the third run, even though the tool was included

System information

Nextflow version: 25.04.4
Hardware: Desktop
Executor: local
OS: Ubuntu Linux
Version of nf-core/tools: 3.3.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载