+
Skip to content

Fix collectFile cache issue causing software version inconsistencies #3654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

ewels
Copy link
Member

@ewels ewels commented Jul 2, 2025

Summary

  • Adds cache: false parameter to collectFile() call for software versions collection in pipeline template
  • Prevents inconsistencies in software version reporting when using Nextflow resume function

Problem

When collectFile() uses storeDir with caching enabled, it can lead to missing or additional processes listed in software version reports across multiple pipeline runs with resume. This happens because:

  1. First run caches software versions in results directory
  2. Second run with different parameters (e.g., --skip_gprofiler) creates new cached versions
  3. Third run with resume may use inconsistent cached data from different runs

Solution

Adding cache: false to the collectFile() call ensures software versions are always collected fresh and consistent with the actual processes that ran.

Test plan

  • Verify template syntax is correct
  • Pre-commit hooks pass
  • CI tests pass
  • Generated pipelines work correctly with resume functionality

Fixes #3653

🤖 Generated with Claude Code

Add cache: false to collectFile() call for software versions collection
to prevent inconsistencies when using Nextflow resume function.

When collectFile uses storeDir with caching enabled, it can lead to
missing or additional processes in software version reports across
multiple pipeline runs with resume.

Fixes nf-core#3653

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

This comment was marked as resolved.

@ewels ewels changed the base branch from main to dev July 2, 2025 14:39
@ewels
Copy link
Member Author

ewels commented Jul 2, 2025

@nf-core-bot changelog

@JohannesKersting
Copy link

@ewels Thanks for creating the pull request! This fixed the issue for me.
However, it will make resuming/caching of downstream processes (multiqc) also impossible, right?

This comment was marked as resolved.

@ewels
Copy link
Member Author

ewels commented Jul 2, 2025

However, it will make resuming/caching of downstream processes (multiqc) also impossible, right?

Yes - It will make any consumers of the ch_collated_versions channel, or processes downstream of those have their cache broken every time.

Generally there aren't any processes downstream of MultiQC, it's typically used as a final step that summarises the run, but you're right that it is something that we should be intentional about. I think that we used to have the cache disabled for MultiQC anyway, but I can't find that config now so maybe it was dropped. I'll raise this in Slack on the #tools channel.

@ewels ewels marked this pull request as draft July 2, 2025 15:58
@edmundmiller
Copy link
Contributor

edmundmiller commented Jul 2, 2025

@ewels does this also fix #3110?

Or does it make it worse? 😱

@ewels
Copy link
Member Author

ewels commented Jul 2, 2025

It makes it worse..

@mahesh-panchal
Copy link
Member

Why not make it a native process?

@maxulysse
Copy link
Member

I'm assuming this will be greatly improved by usage of topics and workflow output

@mahesh-panchal
Copy link
Member

Something I just discovered in my own pipeline, but many pipelines make use of .first() to take the first copy of the versions.yml. The versions.yml that gets selected out of say 5 runs of the same process though is not always the same. This means the path to the versions.yml may change, and so the input set will change preventing caching too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Template: Resume function can lead to inconsistencies in software version reporting
6 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载