Download pipelines with authenticated GH API calls #3607

jpfeuffer · 2025-06-05T15:20:41Z

For repos that follow the nf-core template but need authentication.

PR checklist

This comment contains a description of changes (with reason)
CHANGELOG.md is updated
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

jpfeuffer · 2025-06-05T15:21:56Z

Oops wrong base branch. Fixed.

jpfeuffer · 2025-06-06T09:29:12Z

Done. Test failures (most likely) unrelated or at least I have no idea what they mean.

MatthiasZepper · 2025-07-30T10:57:18Z

Done. Test failures (most likely) unrelated or at least I have no idea what they mean.

There is a problem mapping your CLI arguments to the parameters of the DownloadWorkflow class init() function. It seems they are disordered, and therefore you get the TypeError("object of type 'bool' has no len()")

Apart from that, please mind that there is a major refactor of Downloads ongoing (#3634). Preferably, all new contributions would already be based on and point to this new structure.

JulianFlesch · 2025-08-29T09:44:54Z

How does this fit with the pipeline downloads refactoring @MatthiasZepper @jpfeuffer?
Is this salvageable or maybe already covered by the PRs that were merged?

jpfeuffer · 2025-08-29T09:59:41Z

Good question. Who knows more about the refactor?
From what I gathered from @MatthiasZepper s comment this might not yet be covered.
This is mainly about the repo download, not the download of the individual containers etc.
(Which from what I could see was the main focus of the refactor but judging from its size probably not the only focus)

jpfeuffer · 2025-08-29T10:21:45Z

Yes, I checked. And the mechanism for a download of the repo data/files is still the same.
The patch can be carried over as-is, just for the new folder structure.

jpfeuffer · 2025-08-29T10:23:21Z

And well there needs to be a little fix for the CLI apparently. Note that I just added the CLI option because I still wanted to allow downloads from the Zip URLs because they will never be rate-limited. While non-authenticated downloads from the API can be.

…an flag

codecov · 2025-09-18T13:39:19Z

Codecov Report

❌ Patch coverage is 60.00000% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.26%. Comparing base (3afb111) to head (c113b81).
⚠️ Report is 7 commits behind head on dev.

Files with missing lines	Patch %	Lines
nf_core/pipelines/download/download.py	66.66%	5 Missing ⚠️
nf_core/commands_pipelines.py	0.00%	3 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jpfeuffer · 2025-09-18T13:44:25Z

I quickly let copilot rebase the changes. (and fixed the cli bug)
If you want more code coverage you would need to think about a test case with a Git token. I am not familiar enough with your internal CI to suggest a way forward here.
@JulianFlesch

nf_core/commands_pipelines.py

MatthiasZepper

Julian has kindly taken over the maintainer role for Downloads, so I leave the ultimate decision to him, but I am leaning towards a few changes still.

nf_core/pipelines/download/download.py

MatthiasZepper · 2025-09-23T15:17:29Z

nf_core/__main__.py

    default=4,
    help="Number of allowed parallel tasks",
 )
+@click.option(


I have some reservations regarding the--api-downloadoption from a user experience perspective. The current solution exposes implementation details that users shouldn't need to think about. Instead, we could focus on what the user actually wants to achieve - authenticated vs. anonymous downloads.

Consider something like --authenticated and a help text like Enable authenticated download (with better rate limits, access to private repos, etc.) instead.

Potentially, even --auth-method <method> to future-proof it, if we want to support multiple authentication methods in the future. For example, the option to download pipelines not hosted on GitHub has also been a long-standing request and similar features could then be successively added without renaming / replacing too many CLI arguments.

@MatthiasZepper while I agree with the switch to authenticated, I am not sure if auth_method is helpful here, even in the future.
I think there is usually only one method for each SCM/git provider and even if there are more, you would require extensive logic to make sure that people do not use "github authentication" when actually wanting to download from gitlab.

If we just use gh_api.get instead of requests.get in nf_core/pipelines/download/download.py, no new flag should be needed at all. Or am I missing something?

I vaguely recall that the problem was the need to authenticate at the API even for public repositories. For us developers, who anyway have some key-based authentication or token set up for GitHub, this is essentially unnoticeable.

But for ordinary users, who just get started with nf-core and would like to download their first pipeline, this represents a significant obstacle since they do barely understand the error message or might not even have a GitHub account.

Yes kind of. I think it is possible to use the API unauthenticated, too but you would be rate limited more easily (which is not the case for the public zip download url, afaik). Since I don't know how many requests the CI or some power user does in a short timeframe I decided to put it behind a flag.

MatthiasZepper · 2025-09-23T15:27:28Z

nf_core/pipelines/download/download.py

-        url = requests.get(download_url)
-        with ZipFile(io.BytesIO(url.content)) as zipfile:
-            zipfile.extractall(self.outdir)
+        if not self.api_download:


I think the flow in this function could have a nicer flow for better maintainability of the code.

For example, the download_url for the anonymous download is assembled in line 467ff:

if not self.platform: for revision, wf_sha in self.wf_sha.items(): # Set the download URL and return - only applicable for classic downloads self.wf_download_url = { **self.wf_download_url, revision: f"https://github.com/{self.pipeline}/archive/{wf_sha}.zip", }

That is logically where also the api_url should be created.

If you refactor the logic of the function, you can also reduce a bit code duplication in the ZipFile part and clearly group the topdir and request.get.content / gh_api.get.content closer together for a clearer logic.

Fair enough. I'll see what I can do.

I think there should be no condition here at all. From what it looks like, gh_api can just be used in both conditions either way.

As Matthias points out, instead of creating a url, we should use download_url.

nf_core/commands_pipelines.py

- Rename --api-download to --authenticated for better UX - Replace os.rename with Pathlib operations - Refactor download_wf_files method to reduce code duplication - Rename compress to compress_type consistently across codebase - Update all references and tests accordingly

JulianFlesch

Sorry for taking my time with this review. The repo download could probably be simplified further and I think we should get away without adding a new flag, if I am not missing something.

Is it also a target of this PR to add downloads from different sources (i.e. private repos)?

JulianFlesch · 2025-10-13T21:09:39Z

nf_core/pipelines/download/download.py

-        url = requests.get(download_url)
-        with ZipFile(io.BytesIO(url.content)) as zipfile:
-            zipfile.extractall(self.outdir)
+        if not self.api_download:


I think there should be no condition here at all. From what it looks like, gh_api can just be used in both conditions either way.

As Matthias points out, instead of creating a url, we should use download_url.

JulianFlesch · 2025-10-13T21:24:34Z

nf_core/__main__.py

    default=4,
    help="Number of allowed parallel tasks",
 )
+@click.option(


If we just use gh_api.get instead of requests.get in nf_core/pipelines/download/download.py, no new flag should be needed at all. Or am I missing something?

jpfeuffer · 2025-10-13T22:09:00Z

Yep, private/internal (GitHub) repos should be supported by this. This is basically my use case.

The download_url unification should be addressed by my latest changes.

JulianFlesch · 2025-10-13T22:31:49Z

nf_core/pipelines/download/download.py

+                # Set the download URL - only applicable for classic downloads
+                if self.authenticated:
+                    # For authenticated downloads, use the GitHub API
+                    self.wf_download_url = {
+                        **self.wf_download_url,
+                        revision: f"https://api.github.com/repos/{self.pipeline}/zipball/{wf_sha}",
+                    }
+                else:
+                    # For unauthenticated downloads, use the archive URL
+                    self.wf_download_url = {
+                        **self.wf_download_url,
+                        revision: f"https://github.com/{self.pipeline}/archive/{wf_sha}.zip",
+                    }


Suggested change

# Set the download URL - only applicable for classic downloads

if self.authenticated:

# For authenticated downloads, use the GitHub API

self.wf_download_url = {

**self.wf_download_url,

revision: f"https://api.github.com/repos/{self.pipeline}/zipball/{wf_sha}",

}

else:

# For unauthenticated downloads, use the archive URL

self.wf_download_url = {

**self.wf_download_url,

revision: f"https://github.com/{self.pipeline}/archive/{wf_sha}.zip",

}

# Set the download url to use the GitHub API

self.wf_download_url = {

**self.wf_download_url,

revision: f"https://api.github.com/repos/{self.pipeline}/zipball/{wf_sha}",

}

JulianFlesch · 2025-10-13T22:33:27Z

nf_core/pipelines/download/download.py

+        # Fetch content and determine top-level directory based on authentication method
+        if self.authenticated:
+            # GitHub API download: fetch via API and get topdir from zip contents
+            content = gh_api.get(download_url).content
+            with ZipFile(io.BytesIO(content)) as zipfile:
+                topdir = zipfile.namelist()[0]  # API zipballs have a generated directory name
+                zipfile.extractall(self.outdir)
+        else:
+            # Direct URL download: fetch and construct expected topdir name
+            content = requests.get(download_url).content
+            topdir = f"{self.pipeline}-{wf_sha if bool(wf_sha) else ''}".split("/")[-1]
+            with ZipFile(io.BytesIO(content)) as zipfile:
+                zipfile.extractall(self.outdir)


Suggested change

# Fetch content and determine top-level directory based on authentication method

if self.authenticated:

# GitHub API download: fetch via API and get topdir from zip contents

content = gh_api.get(download_url).content

with ZipFile(io.BytesIO(content)) as zipfile:

topdir = zipfile.namelist()[0] # API zipballs have a generated directory name

zipfile.extractall(self.outdir)

else:

# Direct URL download: fetch and construct expected topdir name

content = requests.get(download_url).content

topdir = f"{self.pipeline}-{wf_sha if bool(wf_sha) else ''}".split("/")[-1]

with ZipFile(io.BytesIO(content)) as zipfile:

zipfile.extractall(self.outdir)

# GitHub API download: fetch via API and get topdir from zip contents

content = gh_api.get(download_url).content

with ZipFile(io.BytesIO(content)) as zipfile:

topdir = zipfile.namelist()[0] # API zipballs have a generated directory name

zipfile.extractall(self.outdir)

I am talking about replacing the old way of downloading with your Github API urls. This works also with unauthenticated requests (within a quota) and that way we can reduce complexity and remove the new parameter.

What are your thoughts @MatthiasZepper ?

jpfeuffer requested a review from fabianegli June 5, 2025 15:20

This comment was marked as outdated.

Sign in to view

jpfeuffer changed the base branch from main to dev June 5, 2025 15:21

jpfeuffer requested a review from mashehu June 6, 2025 09:29

This was referenced Aug 22, 2025

nf-core download should support private repos #2406

Open

Refactor pipeline downloads command to use nextflow inspect for container detection #3634

Merged

JulianFlesch self-assigned this Sep 4, 2025

JulianFlesch added this to nf-core infrastructure projects Sep 9, 2025

github-project-automation bot moved this to Todo in nf-core infrastructure projects Sep 9, 2025

jpfeuffer added 10 commits September 18, 2025 14:51

Download pipeline with auth. GH API

ae66e2e

remove debug

191f685

ruff linter

eff74f4

good old indent on empty line

dfaf3fe

read topdir from zip for rename

9ab9642

ruff....

bb094e2

add a cli flag to avoid rate limit when unauthenticated

b423285

fix test

146632f

fix test: Remove api-download param from test dict since it's a boole…

651a0ca

…an flag

add testing artifacts to gitignore

592d621

jpfeuffer force-pushed the jpfeuffer-patch-1 branch from c8d134c to 592d621 Compare September 18, 2025 13:31

lint

e77b261

jpfeuffer requested review from JulianFlesch and MatthiasZepper and removed request for fabianegli and mashehu September 18, 2025 14:17

rrahn reviewed Sep 22, 2025

View reviewed changes

nf_core/commands_pipelines.py Show resolved Hide resolved

MatthiasZepper requested changes Sep 23, 2025

View reviewed changes

github-project-automation bot moved this from Todo to In Progress in nf-core infrastructure projects Sep 23, 2025

JulianFlesch modified the milestones: 3.4.0, 3.5.0 Oct 7, 2025

jpfeuffer added 4 commits October 10, 2025 10:34

remove duplication

09a8fb0

better code flow.

7310e75

fmt

9b8d397

JulianFlesch requested changes Oct 13, 2025

View reviewed changes

Merge branch 'dev' into jpfeuffer-patch-1

c113b81

JulianFlesch requested changes Oct 13, 2025

View reviewed changes

Download pipelines with authenticated GH API calls #3607

Are you sure you want to change the base?

Download pipelines with authenticated GH API calls #3607

Conversation

jpfeuffer commented Jun 5, 2025

PR checklist

Uh oh!

This comment was marked as outdated.

jpfeuffer commented Jun 5, 2025

Uh oh!

jpfeuffer commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatthiasZepper commented Jul 30, 2025

Uh oh!

JulianFlesch commented Aug 29, 2025

Uh oh!

jpfeuffer commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpfeuffer commented Aug 29, 2025

Uh oh!

jpfeuffer commented Aug 29, 2025

Uh oh!

codecov bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jpfeuffer commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MatthiasZepper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JulianFlesch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpfeuffer commented Oct 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JulianFlesch Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jpfeuffer commented Jun 6, 2025 •

edited

Loading

jpfeuffer commented Aug 29, 2025 •

edited

Loading

codecov bot commented Sep 18, 2025 •

edited

Loading

jpfeuffer commented Sep 18, 2025 •

edited

Loading

JulianFlesch Oct 13, 2025 •

edited

Loading