-
-
Notifications
You must be signed in to change notification settings - Fork 16.2k
Add a handler for compressing HTTP requests #15401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.2
Are you sure you want to change the base?
Add a handler for compressing HTTP requests #15401
Conversation
codec-http/src/main/java/io/netty/handler/codec/http/HttpRequestCompressor.java
Outdated
Show resolved
Hide resolved
codec-http/src/main/java/io/netty/handler/codec/http/HttpRequestCompressor.java
Outdated
Show resolved
Hide resolved
c0965ca to
5ca90a7
Compare
codec-http/src/main/java/io/netty/handler/codec/http/HttpRequestCompressor.java
Outdated
Show resolved
Hide resolved
3f5d3ac to
44695d7
Compare
Motivation: Add a reusable handler that compresses (encodes) HTTP requests and close the long pending feature request netty#2132. Modifications: Adds a new `ChannelOutboundHandler` named `HttpRequestCompressor` to the `io.netty.handler.codec.http` package. The handler supports the same encodings as `HttpContentEncoder` for HTTP responses. Result: After the change, you will be able to compress the body of HTTP requests. ```java HttpClient .create() // add the handler .observe((con, newState) -> { // use CONFIGURED for HTTP/1.1 connections // use STREAM_CONFIGURED for HTTP/2 connections if (newState == HttpClientState.CONFIGURED || newState == HttpClientState.STREAM_CONFIGURED) { con.addHandlerLast(new HttpRequestCompressor("gzip")); } }) .build(); ```
44695d7 to
ac636d2
Compare
| * @see StandardCompressionOptions#deflate() default deflate options | ||
| */ | ||
| public HttpRequestCompressor(String preferredEncoding, int contentSizeThreshold, | ||
| CompressionOptions... compressionOptions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use a builder pattern preemptively. Netty has a history of constructor parameters getting out of control over time.
codec-http/src/main/java/io/netty/handler/codec/http/HttpRequestCompressor.java
Outdated
Show resolved
Hide resolved
codec-http/src/main/java/io/netty/handler/codec/http/HttpRequestCompressor.java
Show resolved
Hide resolved
codec-http/src/main/java/io/netty/handler/codec/http/HttpRequestCompressor.java
Outdated
Show resolved
Hide resolved
| // received HttpContent without previous HttpRequest ... likely because we skipped it | ||
| ctx.write(content, promise); | ||
| } else { | ||
| buf.writeBytes(content); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're leaking the content here, be it a ByteBuf or a HttpContent object.
This also buffers the contents uncontrollably. We should start streaming out compressed chunks as soon as we hit the minimum threshold. Imagine a big file upload going through here, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not how chunked HTTP requests work. Content encoding (compression) does not apply to each individual chunk, but to the entirety (the actual content). The receiver merges all chunks in the order in which they were received and applies the reverse operation to the content encoding (decompression). If you were to send individual, compressed chunks, the recipient would end up with garbage. This handler behaves in a similar way to a receiver: it waits until it has received all the chunks and then compresses the content (previously split into chunks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to do full compression on chunks. Just stream the chunks through the compressor and on the LastHttpContent, finish the compression.
@chrisvest concern is valid. If someone is sending a 10G payload, we will end up with the whole compressed payload in memory before this handler receives LastHttpContent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handler still has to wait and buffer the payload until the size reaches the configured threshold. And it can happen that the size is still below the threshold at the end. It also depends on the compressor: the ZstEncoder for example waits until flush is called and BrotliEncoder waits for close (its encode method does nothing).
So in the end it just gets more complicated and the result can still be the same: the content is buffered - either by the handler or the encoder. I would therefore like to leave it as it is.
Large files are best compressed in advance anyway and not by using on-the-fly compression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should support streaming compression. If brotli doesn't support that, that's a bug in the brotli impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brotli4j supports streaming compression. It is already implemented in Netty response encoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compressed chunks are now written to the context as quickly as possible, i.e. as soon as the threshold value is reached.
The following situations are taken into account:
- the threshold value is reached with the first chunk
- the threshold is reached with a later chunk
- the threshold is reached with the
LastHttpContentchunk. In this case, the request is sent as a compressedFullHttpRequest. - the threshold is never reached. In this case, the request is sent as an uncompressed
FullHttpRequest.
The hex values of the compressed content used in the test were checked with other tools to ensure that they can be properly decoded. The command was
echo -n "${hexdump}" | xxd -r -p | $commandwhere $command is one of the following commands
gzip -dzstd -dzlib-flate -uncompresssnappy -dbrotli -d
I have inserted line breaks for better readability. For zstd, br and snappy, you can clearly see the hex values of the individual characters used as chunks if you look at the last two characters of the encoded chunks.
hyperxpro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some gotchas
|
|
||
| private void handleChunkedHttpRequest(ChannelHandlerContext ctx, ChannelPromise promise, HttpRequest req) { | ||
| // only process requests that are allowed to have content | ||
| if (req.method().equals(HttpMethod.POST) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC does not prevents GET from having a payload.
See: https://datatracker.ietf.org/doc/html/rfc7231#section-4.3.1
We should not have these checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it is common practice that only the POST, PUT and PATCH methods have content. With GET in particular, the content is ignored or even rejected by most servers and clients. I think this is therefore a fair assumption and optimization. But I have to admit, if there are such checks for chunked requests, they should also be done for FullHttpRequest and they should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not change semantics. You can write an HTTP server in Netty to have a payload of 1 GB in a GET request. There is nothing to stop, as RFC does not either.
| if (encoderChannel == null) { | ||
| buf.writeBytes(content); | ||
| // buffered content exceeds threshold, start streaming and send stored HttpRequest | ||
| if (buf.isReadable() && buf.readableBytes() >= contentSizeThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content Threshold applies to Content-Length of request, not individual chunks. If the Content-Length header is present, this logic will apply to the whole request (including upcoming chunks). If the request has Transfer-Encoding: chunked, it should always compress since there will be no visibility of content length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content Threshold applies to Content-Length of request, not individual chunks.
Yes and no. For this handler, it is defined as the size of the content independent of the content-length header. The Javadoc for this parameter is very clear in this regard.
the size in byte the http body must have before compressing the request
this logic will apply to the whole request (including upcoming chunks).
That is exactly what it is intended for.
If this were ignored, the concept of a threshold for chunked requests would be pure nonsense. The idea of the threshold is to skip compression for small content, which would just be a waste of resources and produce an even bigger result than it originally was (as you can see from the examples in the test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is expected behaviour. The threshold only applies if there is Content-Length. Nginx and others follow the same.
https://nginx.org/en/docs/http/ngx_http_gzip_module.html
You can have chunks that will be smaller than the threshold, and in that case, you cannot skip compressing them and send corrupted content into an already compressed request.
Say the threshold is 1 MB. The request will generate 5 chunks, the first 4 will be 2 MB, and the last one will be 100 KB. In this case, your logic will compress the first four chunks properly and skip the last one, resulting in a corrupted stream.
I'd suggest reading the Nginx source to get more in-depth content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, your logic will compress the first four chunks properly and skip the last one, resulting in a corrupted stream.
No, as soon as the threshold value for all previous chunks together is reached, the subsequent chunks are also compressed. The channel with the encoder is only created when the threshold value is reached (handleHttpContent, line 291). If a new chunk is received, it is compressed immediately without being written to the internal buffer and without further checking whether the threshold value has been reached (handleHttpContent, line 304).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't find this approach scalable because we will keep holding onto buffers until the threshold is reached to determine whether to compress or not. We can have a threshold of 100 MB, send 10 chunks of 10 MB each, and it will keep on accumulating in memory until 100 MB is reached.
@chrisvest WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the trade-off. I just give the user the option to choose and let them decide.
Other implementations also have the option of compressing only certain mime types in order to skip compression for poorly compressible or already compressed content.
| * @param in the remaining content to compress | ||
| * @param out the destination of the compressed content | ||
| */ | ||
| private void finishCompress(EmbeddedChannel encoderChannel, ByteBuf in, ByteBuf out) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need EmbeddedChannel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be used then? EmbeddedChannel is also used by HttpContentEncoder.
Motivation:
Add a reusable handler that compresses (encodes) HTTP requests and close the long pending feature request #2132.
Modifications:
Adds a new
ChannelOutboundHandlernamedHttpRequestCompressorto theio.netty.handler.codec.httppackage. The handler supports the same encodings asHttpContentEncoderfor HTTP responses.Result:
After the change, you will be able to compress the body of HTTP requests.
Fixes #2132