+
Skip to content

feat: Install subworkflows with modules from different remotes #3083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 91 commits into from
Mar 20, 2025

Conversation

jvfe
Copy link
Member

@jvfe jvfe commented Jul 29, 2024

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Addresses #1927 and #2497

Issue information

Currently, get_components_to_install is the function used to download a subworkflow's modules. It relies on simple text parsing to define where to get a subworkflow/module from and install it.

As outlined in the above issues (and in slack conversations), it is currently not possible to install subworkflows which depend on modules that originate from different remotes. The solution is to copy over required modules to a same remote in order to use them.

Implementation

At this moment the implementation is mostly a work-around/hack. I aimed to make the least possible modifications to the source code in order to make the intended behavior functional. Here's how it works:

  • It still uses the same text parsing for all modules and subworkflows that don't have a meta.yml file defined.

  • If a subworkflow defines a meta.yml, they can optionally define the git_remote and org_path keys for a component. This is based on a suggestion given on the original issue. E.g.:

components:
  - name: gatk4/mutect2
  - name: gatk4/genomicsdbimport
    org_path: custom_remote
    git_remote: https://github.com/custom_remote/modules.git

This info is used to grab the respective modules from the specified remote.
I've setup a dummy repository with a subworkflow that showcases this. If no remote is specified, it assumes the module is from the same remote as the subworkflow (as it is currently).

Trying to run nf-core subworkflows --git-remote https://github.com/jvfe/test-subworkflow-remote install get_genome_annotation with this branch of nf-core/tools should work. There's a basic test I've setup to test for this.

Known issues

All in all, I'm eager to hear what everyone thinks. I'm tagging @muffato who has been supervising and helping me in this implementation.

@jvfe jvfe changed the base branch from master to dev July 29, 2024 11:33
jvfe added 3 commits August 5, 2024 10:13
fix: Avoid duplicate downloads in nested subworkflows
* dev: (299 commits)
  Update nf_core/pipelines/create/utils.py
  [automated] Update CHANGELOG.md
  allow numbers in custom pipeline name
  [automated] Update CHANGELOG.md
  Update pre-commit hook pre-commit/mirrors-mypy to v1.11.1
  Fixed linting
  Updated changelog
  Added process_high_memory to create/lint list
  update chagelog
  use pathlib instead of os.path
  Apply suggestions from code review
  add option to skip code linters
  Update gitpod/workspace-base Docker digest to f189a41
  Update pre-commit hook pre-commit/mirrors-mypy to v1.11.0
  [automated] Update CHANGELOG.md
  Update python:3.12-slim Docker digest to 740d94a
  Update .pre-commit-config.yaml
  Update nf_core/__main__.py
  Add default version
  Automatically add a default version in pipelines_create
  ...
@nf-core nf-core deleted a comment from github-actions bot Aug 5, 2024
@awgymer
Copy link
Contributor

awgymer commented Aug 7, 2024

I'm very wary of increasing the idiosyncrasies in the code and I feel that making it such that the code returns variably either strings or dicts and thus necessitates isinstance checks is liable to be difficult to follow/trace through the code in future.

I understand the desire to simply get something working but I think that if we are going to implement this and support it we need to do it right.

I think this also indicates support only for cross-organisational modules and not cross-organisational subworkflows?

@muffato
Copy link
Member

muffato commented Aug 8, 2024

I think this also indicates support only for cross-organisational modules and not cross-organisational subworkflows?

Currently, yes

I'm very wary of increasing the idiosyncrasies in the code and I feel that making it such that the code returns variably either strings or dicts and thus necessitates isinstance checks is liable to be difficult to follow/trace through the code in future.

Once get_components_to_install is updated to return dictionaries for sub-workflows too, all the isinstance(..., dict) will evaluate to true, and all their if statements will be simplified / collapse.

I understand the desire to simply get something working but I think that if we are going to implement this and support it we need to do it right.

It's a question of time management (@jvfe is working part time on this project). We're not sure about about the handling of the default org vs the module org in nf_core/components/install.py (cf the creation of ModulesRepo instances). We didn't want to invest deeper without clearing that first. Making the change in get_components_to_install is easy, but finding the other files that need an update and making sure that all the tests still pass takes a bit longer.

Copy link
Member

@mirpedrol mirpedrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @jvfe!
I think this is a good start 😄 I agree with @awgymer that allowing component to be both a string or a dict is confusing and it will cause problems. But I like the idea of using the information in the meta.yml to obtain the remote form which the component should be installed.
My suggestion would be to convert component to a dictionary in all cases since the beginning, with the default being the current remote.

jvfe added 3 commits August 10, 2024 14:35
* dev: (35 commits)
  don't remove creating a pipeline to test it 😶
  update textual snapshots
  update textual snapshots
  fix indentation
  Update nf_core/pipelines/create/create.py
  [automated] Update CHANGELOG.md
  [automated] Update CHANGELOG.md
  add option to exclude changelog from custom pipeline template
  add option to exclude codespaces from pipeline template
  remove multiqc images from the template docs
  and the same for subworkflows
  run tests also after pytest-migrate
  update textual snapshots
  add multiqc potion to test data
  update textual snapshots
  fix tests
  [automated] Update CHANGELOG.md
  add option to exclude multiqc from pipeline template
  rename regenerae-snapshots to update-textual-snapshots
  Apply suggestions from code review
  ...
@mashehu mashehu modified the milestones: 3.4.0, 3.3.0 Feb 24, 2025
Copy link
Contributor

@mashehu mashehu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Does it also work with nf-core subworkflows patch?

Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>
@mirpedrol
Copy link
Member

Does it also work with nf-core subworkflows patch?

I have tested that it works 👍 patch is independent of the included components so it should not be affected by these changes

@awgymer
Copy link
Contributor

awgymer commented Mar 7, 2025

As I can see this PR is looking good and probably heading towards a merge I just want to gently raise my previous point

I'm unclear on what level of commitment to maintain and work around issues this introduces there is from an nf-core standpoint.

This PR adds functionality to make cross-repo subworkflows work but we are still lacking a serious CI testing solution and I'm worried about other footguns the functionality potentially opens up. I'm conscious that greater private repo support is something many (including myself!) want but I also want to ensure we achieve it with stability and long-term viability.

@muffato
Copy link
Member

muffato commented Mar 7, 2025

@awgymer What CI testing is missing ? This includes does include tests for the new cross-repo functionality. That's in fact how it was developed: @jvfe first wrote (failing) test cases and then implemented the code so that they pass.

@awgymer
Copy link
Contributor

awgymer commented Mar 7, 2025

@awgymer What CI testing is missing ?

It's a fundamental limitation of this style of subworkflows that you cannot test them easily in their remotes on CI right now and this tools functionality does nothing to address that. Once again my issue is not so much with the actual code of this PR but rather the wider implications this PR has.

@muffato
Copy link
Member

muffato commented Mar 7, 2025

Ah right. Well, it sounds like it is a good idea for another internship/mentorship ;) a nf-core command to install missing cross-org modules so that tests can then run.

@mashehu
Copy link
Contributor

mashehu commented Mar 10, 2025

how about a CI test, where we create a test pipeline and build a mixed subworkflow there?

@muffato
Copy link
Member

muffato commented Mar 10, 2025

It's there already: https://github.com/nf-core/tools/pull/3083/files#diff-60cea8a0cf3a284ef26219e6dc9b9b2b55ebb25214cb520b0ae38ef6179436d7R87

It uses modules/subworkflows across nf-core/modules and https://github.com/nf-core-test/modules

@mirpedrol
Copy link
Member

Hello!
After a chat with the #infrastructure team, we think that this PR can be merged, as the missing CI testing is not affecting the nf-core modules repo.
A solution for this can be discussed afterwards. For instance, one option would be to create a pipeline, install the subworkflow, and test it there.

@jvfe
Copy link
Member Author

jvfe commented Mar 19, 2025

Hi everyone sorry for my absence as of late, but I'm glad to see the feature getting some traction and coming closer to a merge!

A solution for this can be discussed afterwards. For instance, one option would be to create a pipeline, install the subworkflow, and test it there.

This seems like an adequate solution to me, although it'd be better implemented in a separate PR.

The current CI failures seem to be caused by the nf-schema plugin and don't seem related to the feature, so not sure if there's a proper solution from this side.

@mirpedrol
Copy link
Member

The current CI failures seem to be caused by the nf-schema plugin and don't seem related to the feature, so not sure if there's a proper solution from this side.

Hi @jvfe, the fail is due to the edge release of Nextflow, the Nextflow team is working on a fix. Let's wait to see if they fix it soon, otherwise, I will consider merging without these tests (as they were passing before). Before the hackathon would be a good timing for people to test it out and find possible bugs

@mirpedrol mirpedrol merged commit 6de7d93 into nf-core:dev Mar 20, 2025
94 checks passed
@muffato
Copy link
Member

muffato commented Mar 20, 2025

Thank you everyone !
happy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
7 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载