+
Skip to content

Conversation

ewels
Copy link
Member

@ewels ewels commented Jul 2, 2025

Summary

  • Adds cache: false parameter to collectFile() call for software versions collection in pipeline template
  • Prevents inconsistencies in software version reporting when using Nextflow resume function

Problem

When collectFile() uses storeDir with caching enabled, it can lead to missing or additional processes listed in software version reports across multiple pipeline runs with resume. This happens because:

  1. First run caches software versions in results directory
  2. Second run with different parameters (e.g., --skip_gprofiler) creates new cached versions
  3. Third run with resume may use inconsistent cached data from different runs

Solution

Adding cache: false to the collectFile() call ensures software versions are always collected fresh and consistent with the actual processes that ran.

Test plan

  • Verify template syntax is correct
  • Pre-commit hooks pass
  • CI tests pass
  • Generated pipelines work correctly with resume functionality

Fixes #3653

🤖 Generated with Claude Code

Add cache: false to collectFile() call for software versions collection
to prevent inconsistencies when using Nextflow resume function.

When collectFile uses storeDir with caching enabled, it can lead to
missing or additional processes in software version reports across
multiple pipeline runs with resume.

Fixes nf-core#3653

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

This comment was marked as resolved.

@ewels ewels changed the base branch from main to dev July 2, 2025 14:39
@ewels
Copy link
Member Author

ewels commented Jul 2, 2025

@nf-core-bot changelog

@JohannesKersting
Copy link

@ewels Thanks for creating the pull request! This fixed the issue for me.
However, it will make resuming/caching of downstream processes (multiqc) also impossible, right?

This comment was marked as resolved.

@ewels
Copy link
Member Author

ewels commented Jul 2, 2025

However, it will make resuming/caching of downstream processes (multiqc) also impossible, right?

Yes - It will make any consumers of the ch_collated_versions channel, or processes downstream of those have their cache broken every time.

Generally there aren't any processes downstream of MultiQC, it's typically used as a final step that summarises the run, but you're right that it is something that we should be intentional about. I think that we used to have the cache disabled for MultiQC anyway, but I can't find that config now so maybe it was dropped. I'll raise this in Slack on the #tools channel.

@ewels ewels marked this pull request as draft July 2, 2025 15:58
@edmundmiller
Copy link
Contributor

edmundmiller commented Jul 2, 2025

@ewels does this also fix #3110?

Or does it make it worse? 😱

@ewels
Copy link
Member Author

ewels commented Jul 2, 2025

It makes it worse..

@mahesh-panchal
Copy link
Member

Why not make it a native process?

@maxulysse
Copy link
Member

I'm assuming this will be greatly improved by usage of topics and workflow output

@mahesh-panchal
Copy link
Member

Something I just discovered in my own pipeline, but many pipelines make use of .first() to take the first copy of the versions.yml. The versions.yml that gets selected out of say 5 runs of the same process though is not always the same. This means the path to the versions.yml may change, and so the input set will change preventing caching too.

@awgymer
Copy link
Contributor

awgymer commented Sep 25, 2025

I have found that in my resumed pipeline the software versions yaml just simply isn't being updated at all. I'm not sure if this is related to the bug/behaviour that this PR addresses, but leaving comment here so it's noted somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Template: Resume function can lead to inconsistencies in software version reporting

7 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载