+
Skip to content

Conversation

mashehu
Copy link
Contributor

@mashehu mashehu commented Nov 7, 2024

Close #1383

@mashehu mashehu requested a review from sateeshperi November 7, 2024 13:26
Copy link

codecov bot commented Nov 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.50%. Comparing base (71b26a4) to head (f630d39).
Report is 9 commits behind head on dev.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mashehu mashehu requested a review from nictru November 7, 2024 15:37
nictru and others added 2 commits November 7, 2024 16:38
Co-authored-by: Sateesh_Peri <33637490+sateeshperi@users.noreply.github.com>
Co-authored-by: Sateesh_Peri <33637490+sateeshperi@users.noreply.github.com>
@nictru
Copy link
Contributor

nictru commented Nov 7, 2024

So the flow of information in the scdownstream pipeline looks as following:

  1. Users provide the gpu profile
  2. The gpu profile prepares the containerization tools for GPU mounting and sets the hidden pipeline parameter use_gpu to true -> can be used for if-clauses in workflows
  3. Processes with GPU support get the process_gpu label -> can be used to handle all GPU-enabled processes in a certain way. Different executors need different tweaking to handle tasks from these processes correctly. I added a bit of documentation here.
  4. This section makes sure that processes have an ext variable which reflects if they should use GPU or not. This can be useful if the same module supports usage both with and without GPU. Example: cellbender

This implementation is the best I could come up with so far, but not many really "senior" people have had a look at it AFAIK. Maybe we can discuss which elements of the suggested approach should be part of the template and where we can perform some improvements.

@nictru
Copy link
Contributor

nictru commented Nov 7, 2024

Another thought: Should this not be an optional part of the template?

@mashehu
Copy link
Contributor Author

mashehu commented Nov 7, 2024

it's teeny-tiny enough, that we might keep it in, imo, similar to arm... but also not 100% sure

@nictru
Copy link
Contributor

nictru commented Nov 7, 2024

But is the goal to only add the profile to the template?

I think it would at least be nice to have a common structure for all GPU-enabled modules, even if it only means adding the process_gpu label

@sateeshperi sateeshperi requested a review from GallVp November 7, 2024 18:12
@GallVp
Copy link
Member

GallVp commented Nov 7, 2024

Thank you @nictru

  1. Users provide the gpu profile

The gpu profile being added to the template does not have the use_gpu configuration variable. I can see that you have added it to the profile in the pipeline: https://github.com/nf-core/scdownstream/blob/3231971f309d1ac025e7180b69852c3637c975dd/nextflow.config#L182

I don't think we need a separate configuration variable.

  1. The gpu profile prepares the containerization tools for GPU mounting and sets the hidden pipeline parameter use_gpu to true -> can be used for if-clauses in workflows

In the scdownstream pipeline, use_gpu is defined as a pipeline parameter (https://github.com/nf-core/scdownstream/blob/3231971f309d1ac025e7180b69852c3637c975dd/nextflow.config#L40) and a configuration variable (https://github.com/nf-core/scdownstream/blob/3231971f309d1ac025e7180b69852c3637c975dd/nextflow.config#L182). These two are different so setting gpu in the -profile, will not set the use_gpu pipeline parameter. It will only set the configuration variable.

I think we only need the use_gpu pipeline parameter and should get rid of the use_gpu configuration variable. The use_gpu label in the base.config file can be changed to,

process {
    withLabel: 'use_gpu' {
        ext.use_gpu = { params.use_gpu }
    }
}

At pipeline execution, the user must set the gpu profile and the use_gpu pipeline parameter to utilise the GPU-based tools.

  1. Processes with GPU support get the process_gpu label -> can be used to handle all GPU-enabled processes in a certain way. Different executors need different tweaking to handle tasks from these processes correctly. I added a bit of documentation here.

Thank you. This is very helpful.

  1. This section makes sure that processes have an ext variable which reflects if they should use GPU or not. This can be useful if the same module supports usage both with and without GPU. Example: cellbender

This is quite clever. Nice!

@nictru
Copy link
Contributor

nictru commented Nov 8, 2024

Hey @GallVp, thanks for the input.

Some notes:

It might sound stupid, but I did not know there was a difference between pipeline parameters and configuration variables. I agree that we should only have one.

I am not fully aware about what you can and can't do with configuration variables. I also was not able to find any documentation about them at all. I only used the configuration variable in scdownstream. Not sure if one can use configuration variables for workflow if clauses?

  • If no, then the pipeline parameter should be preferred, but it should still be set to true by the gpu profile and hidden from the parameter documentation
  • If yes, I think then the configuration variable is better, because it prevents users from using one without the other

At pipeline execution, the user must set the gpu profile and the use_gpu pipeline parameter to utilise the GPU-based tools.

I would prefer if setting the gpu profile would be sufficient. Keeping them separate will probably lead to a lot of users using one without the other, which almost certainly will lead to errors. I am not aware of any use cases where this would be an advantage, but I'm eager to hear.

Copy link
Contributor

@nictru nictru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not approving yet, because we should first decide what the implementation should look like and add all necessary parts

@GallVp
Copy link
Member

GallVp commented Nov 9, 2024

Not sure if one can use configuration variables for workflow if clauses?

No, configuration variables are not available in the workflows. Maybe, there is a backdoor that I am not aware of.

How about we get rid of both the variable and the parameter? Instead, we modify the profile as,

gpu {
    process {
        withLabel: process_gpu {
            ext.use_gpu = true
        }
    }
    docker.runOptions = '-u $(id -u):$(id -g) --gpus all'
    apptainer.runOptions = '--nv'
    singularity.runOptions = '--nv'
}

The workflows can use workflow.profile.contains('gpu') in place of the use_gpu parameter.

The above solution, however, does not take care of the arm profile. So, we need to have a gpu_arm profile as well.

@nictru
Copy link
Contributor

nictru commented Nov 13, 2024

Hey, I just tested the suggested approach and it seems to work as expected.
I think we could do it like this, but I would prefer having the withLabel block in the base.config - just to make sure that all the withLabel configs are collected in one place.

I did this using the following:

withLabel: process_gpu {
    ext.use_gpu = {workflow.profile.contains('gpu')}
}

So that we don't need a config variable. But this is more a cosmetic topic and not a hill-to-die-on.

@mashehu
Copy link
Contributor Author

mashehu commented Nov 13, 2024

feel free to add the changes, @nictru

Co-authored-by: Nico Trummer <nictru32@gmail.com>
@GallVp
Copy link
Member

GallVp commented Nov 13, 2024

I did this using the following:

withLabel: process_gpu {
    ext.use_gpu = { workflow.profile.contains('gpu') }
}

I agree. This is more in line with the existing infrastructure. Thank you!

Copy link
Contributor

@nictru nictru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy now :)

@ewels ewels modified the milestones: 3.1, 3.2 Nov 21, 2024
@ewels
Copy link
Member

ewels commented Nov 21, 2024

Needs opt-in functionality in the nf-core pipelines create. See also #3261

@mirpedrol mirpedrol removed this from the 3.2 milestone Jan 27, 2025
@mirpedrol mirpedrol added this to the 3.3.0 milestone Jan 27, 2025
@mirpedrol
Copy link
Member

🧹 spring cleaning message 🌷

What is the status of this PR? It looks ready after solving the conflicts. Are you planing to finish it?

@mashehu
Copy link
Contributor Author

mashehu commented Mar 11, 2025

still need to opt-in template features first

@jfy133
Copy link
Member

jfy133 commented Apr 8, 2025

This PR is indemand @mirpedrol @mashehu 🙏

https://nfcore.slack.com/archives/C043UU89KKQ/p1743749138969479

Note that @adamrtalbot is running through a few use cases how it would be used in configs if this helps in any way: https://nfcore.slack.com/archives/C043UU89KKQ/p1744102571110729?thread_ts=1744098920.100049&cid=C043UU89KKQ

maxRetries = 2
}
withLabel: process_gpu {
ext.use_gpu = {workflow.profile.contains('gpu')}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ext.use_gpu = {workflow.profile.contains('gpu')}
ext.use_gpu = { workflow.profile.contains('gpu') }

Copy link
Member

@JoseEspinosa JoseEspinosa Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then to set the accelerator for the individual processes as here, you suggest using something such as:

process {
        withName: 'RUN_ALPHAFOLD2' {
              accelerator = workflow.profile.contains('gpu') ? 1 : 0
        }
}

Instead of using a parameter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's the idea

Co-authored-by: Jose Espinosa-Carrasco <kadomu@gmail.com>
@mirpedrol
Copy link
Member

This PR is indemand @mirpedrol @mashehu 🙏

In order to merge this, we need to add an opt-in functionality when creating the pipeline template.
Until then, pipelines that support GPUs can add these changes 👍 This will also serve as a POC and debugging before we add this to the template.

@jfy133
Copy link
Member

jfy133 commented Apr 9, 2025

For curiosity what exactly needs to be 'opt in' here?

Is there any harm having the label in the template so modules can start using it?

The label could also be optionally empty for now, but at least allows custom/institutional config-level settings

@mashehu
Copy link
Contributor Author

mashehu commented Apr 10, 2025

we will also add gpu CI with that opt-in feature.

@mirpedrol
Copy link
Member

What do you think if we merge this now, and we add it to all pipelines with the template update. But we agree on not adding GPU CI until we have the opt-in functionality. This profile will be included to this opt-in feature later, so it will be removed from all pipelines that don't select the opt-in feature.
Note that this will need a bit of planning and communication, otherwise some pipelines might needit and get it removed with the template if they don't notice about the new opt-in, so worth giving it a thought before merging this PR

@jfy133
Copy link
Member

jfy133 commented Apr 10, 2025

Could also do that. However I think the less 'breaky' method would be adding just the label as a valid label.

People can then already use that in their configs to 'replicate' the GPU profile in this PR themselves, until the GPU profile itself goes into the template. That way it doesn't get 'removed'.

i.e., the label is cheap, and can be in there without it having to be opt in and it won't bother anywone

@mashehu
Copy link
Contributor Author

mashehu commented Apr 10, 2025

sorry, I don't understand why having an empty label would help or what you mean with valid labels.

@jfy133
Copy link
Member

jfy133 commented Apr 10, 2025

Because then modules can be validly labelled as having GPU support, and configs can actually do something with it

Then I can set in my institutional config

process {
    withLabel: proces_gpu {
         Queue = 'my_clusterd_gpu_queue'
    }

}

Or whatever

@jfy133
Copy link
Member

jfy133 commented Apr 10, 2025

Because then modules can be validly labelled as having GPU support.

Then I can set in my institutional config

process {
    withLabel: proces_gpu {
         Queue = 'my_clusterd_gpu_queue'
    }

}

Or whatever

@jfy133
Copy link
Member

jfy133 commented Apr 11, 2025

See the examples here: nf-core/configs#876 (review)

@mirpedrol
Copy link
Member

This would be at the modules guidelines level no? We don't need the label in the template to be able to use it in the modules

@jfy133
Copy link
Member

jfy133 commented Apr 11, 2025

This would be at the modules guidelines level no? We don't need the label in the template to be able to use it in the modules

It can go there, but the documentation of what are the valid labels are points to the pipeline template... (and I assumed where your fixed list comes from when selecting this in the dropdown in modules create)

@mirpedrol
Copy link
Member

I didn't think about the dropdown from modules create, that's true. But currently we only allow one label, so if we add this it would be a bigger change. I think it is better to allow people to add the label manually and not modify how modules create works.

@jfy133
Copy link
Member

jfy133 commented Apr 11, 2025

I didn't think about the dropdown from modules create, that's true. But currently we only allow one label, so if we add this it would be a bigger change. I think it is better to allow people to add the label manually and not modify how modules create works.

Good point. But I still think it makes sense to already have this in the pipeline template already then so we can refer already to our 'standard' i.e., a single process_gpu rather than just gpu or gpu_low or process_gpu_low all of which we are seeing now crop up in configs 😅

@mashehu
Copy link
Contributor Author

mashehu commented Apr 24, 2025

@jfy133 are you okay, with merging this PR as it is now? Rest we discussed we can do later.

@jfy133
Copy link
Member

jfy133 commented Apr 24, 2025

The bit in base.config is probably sufficient for now 👍

We may just want to consider whether it should be process_gpu_single process_gpu_low etc for adjusting the number of GPUs but I think we can assume process_gpu_single etc can be added later

@mashehu mashehu merged commit 49f761d into nf-core:dev Apr 24, 2025
96 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载