+
Skip to content

Conversation

jjaakola-aiven
Copy link
Contributor

@jjaakola-aiven jjaakola-aiven commented Sep 23, 2025

Notes:

  • Uses direct memory buffers. Recommend to run Diskless with "-Dio.netty.maxDirectMemory=0" to have the Netty cleaner running.
  • Has static 96 max connections pool.
  • Has static 32 worker thread pool.
  • "SO_KEEPALIVE" set for sockets and keep alive header for HTTP.
  • Compression disabled, producer compression recommended and compressing again likely not beneficial.
  • GCS client handles redirects, Netty Reactor client following disabled.
  • Can use static BoringSSL library to offload SSL to OpenSSL.
  • Zero-copy until the response handling where direct memory buffer bytes are copied to heap manager byte array.

Notes:
 * Uses direct memory buffers. Recommend to run Diskless
   with "-Dio.netty.maxDirectMemory=0" to have the Netty cleaner running.
 * Has static 96 max connections pool.
 * Has static 32 worker thread pool.
 * "SO_KEEPALIVE" set for sockets and keep alive header for HTTP.
 * Compression disabled, producer compression recommended and compressing
   again likely not beneficial.
 * GCS client handles redirects, Netty Reactor client following disabled.
 * Can use static BoringSSL library to offload SSL to OpenSSL.
 * Zero-copy until the response handling where direct memory buffer bytes
   are copied to heap manager byte array.
@jjaakola-aiven jjaakola-aiven force-pushed the jjaakola-aiven-gcs-storage-use-reactor-netty-http-client branch from 239c92c to 697fdfd Compare September 23, 2025 12:52
Copy link

@agrawal-siddharth agrawal-siddharth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's some feedback and a suggested approach from Google GCS SDK team.

GCS has a gRPC based API, which the GCS Java SDK supports[1] out of the box. My recommendation would be for them to use that if they are wanting an async transport core where we've already done the work to bridge async to streaming (and sometimes non-blocking) channels and to use less CPU compared to the default NetHttpTransport.

I'm curious what they hope to gain by using netty as the lower level transport layer. Netty can be a great http client, but the GCS java SDK is not async in the vast majority of operations and will block the invoking thread as necessary, additionally we've already done a notable amount of work the past few years to reduce memory usage and unnecessary allocations to the heap.

That said, if I were to attempt something like this, I would attempt to get it working in the test suite[1] we already have for the GCS Java SDK. Create a branch of the repo, then modify HttpStorageOptions.HttpStorageDefault#getDefaultTransportOptions()[2] to return their Netty based transport. Then running the integration test suite mvn -Dmaven.test.skip.exec=true -Penable-integration-tests clean verify

From a superficial standpoint, one challenge they will likely run into is the need for streaming of large amount of bytes. The GCS Java SDK does not have any client side limits on how large objects or their streams can be, this can be a challenge for Netty due to Netty operating primarily on ByteBufs. I know of users who are uploading multiple gigabyte objects on a single stream, and similarly reading many gigabytes in a streaming mode often with application backpressure.

[1] https://cloud.google.com/storage/docs/enable-grpc-api
[2] https://github.com/googleapis/java-storage/tree/main/google-cloud-storage/src/test/java/com/google/cloud/storage
[3] https://github.com/googleapis/java-storage/blob/main/google-cloud-storage/src/main/java/com/google/cloud/storage/HttpStorageOptions.java#L341-L343

@jjaakola-aiven
Copy link
Contributor Author

@agrawal-siddharth Thank you for the comments. The intent here is to reduce CPU usage and byte copying when loading from GCS, so main change is the possibility to offload the SSL handling to OpenSSL. The SSL handling is dominant in the CPU usage graphs. Also the Java HTTP client used by the GCS client by default is not very easy to control, so this provides better control. I'll definitely look the gRPC option which seems to be generally available now and would provide single socket and multiple streams.

The size of the files is not very large, based on the benchmarks I have run they vary between 2 MiB and 8 MiB, upper limit is 16 MiB. But I agree that for general use case this must be able to handle large files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载