+
Skip to content

Add option to use S3 accelerated endpoint for faster transfers #3675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pchoisel
Copy link
Contributor

No description provided.

@pchoisel pchoisel force-pushed the add_option_to_use_S3_accelerated_endpoint branch from 3164587 to 8e3966b Compare May 26, 2025 14:29
@manthey
Copy link
Member

manthey commented May 27, 2025

In a quick read about the use_accelerate_endpoint, it sounds like it only has benefit when the client is in a different region from the bucket and the files are largish. Should there be any sort of check for either of these conditions, since using use_accelerate_endpoint incurs higher transfer costs?

@pchoisel
Copy link
Contributor Author

In a quick read about the use_accelerate_endpoint, it sounds like it only has benefit when the client is in a different region from the bucket and the files are largish. Should there be any sort of check for either of these conditions, since using use_accelerate_endpoint incurs higher transfer costs?

That sounds good, but I'm not sure how to implement this.
I can easily know the location of the S3 bucket, but it's harder to get the location of the client. Maybe using its IP address ? But that would work if Girder is reverse proxied.

@zachmullen
Copy link
Member

@pchoisel it might help to explain the problem you're trying to solve, then we may be able to provide better input on design.

@pchoisel
Copy link
Contributor Author

@zachmullen Thanks.
A customer wants to enable S3 transfer acceleration for his users that are not in the US to speed up their uploads/downloads to a Girder-based application using an S3 assetstore.
You can enable that by changing the domain name used to connect to S3, but Boto does that automatically if you set use_accelerate_endpoint in the config.

To make this more optimized, I could perhaps use this accelerated endpoint only when Girder sends a transfer link to a user so their client can transfer data to S3, and not when Girder directly transfers data to S3 ?

@zachmullen
Copy link
Member

To make this more optimized, I could perhaps use this accelerated endpoint only when Girder sends a transfer link to a user so their client can transfer data to S3, and not when Girder directly transfers data to S3 ?

I do think this is the right place to make the choice -- at the point of building the presigned URLs rather than bound to the lifetime of the assetstore. I see two possibilities:

  1. We attempt to infer the right decision based on properties of the client, e.g. geolocation based on the client IP, and make the choice for them
  2. We allow the caller of the REST endpoint(s) to declare that they want to use accelerated transfer, moving the decision point to the client-side or the end users themselves (e.g. a checkbox to enable or disable it under "advanced options" or something).

I think option 2 is a much better idea, but am open to discussion.

@pchoisel
Copy link
Contributor Author

pchoisel commented Jun 2, 2025

I agree that option 2 is the best. However, there are some clients that I cannot change easily #girderwebcomponents
But that's fine, I'll just patch them a bit more. Inferring the location of the client using its IP address seems really shaky.

Do you think I should still add an assetstore option to enable accelerated transfer ? That would make the endpoints return an error if it's disabled but the client requested a accelerated transfer.

@zachmullen
Copy link
Member

Yes, we should probably only allow accelerated transfer if the assetstore explicitly allows it.

@pchoisel pchoisel force-pushed the add_option_to_use_S3_accelerated_endpoint branch 2 times, most recently from e42bb5c to 04d9618 Compare June 5, 2025 12:57
@pchoisel
Copy link
Contributor Author

pchoisel commented Jun 5, 2025

Here it is. I originally wanted to store the fact that an upload should be made with acceleration in the upload document, but I was worried about S3 usage being restricted in between the upload init and the chunk upload.

Also, using acceleration just for uploading chunks and using the regular URL for the rest of the requests (completion for example) works fine.

I changed the extraParameters arg of the download API from a param to a jsonParam. I couldn't find anything using it and I think it made more sense. Let me know if I should revert that back.

S3 buckets transfer acceleration is an AWS feature that speeds up data
transfer from a client to an S3 bucket using CloudFront.
@pchoisel pchoisel force-pushed the add_option_to_use_S3_accelerated_endpoint branch from 04d9618 to a902a13 Compare June 12, 2025 07:57
@pchoisel
Copy link
Contributor Author

@manthey If you have some time, could you have a look ?

@@ -254,8 +262,8 @@ def readChunk(self, upload, offset, params):
.param('contentDisposition', 'Specify the Content-Disposition response '
'header disposition-type value.', required=False,
enum=['inline', 'attachment'], default='attachment')
.param('extraParameters', 'Arbitrary data to send along with the download request.',
required=False)
.jsonParam('extraParameters', 'Arbitrary data to send along with the download request.',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the ramifications are of changing from param to jsonParam will be. This switching the extra parameters from a string to a parsed json object. The only place I see it used before this is in downloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载