Feature request: fast inflate that discards data, to find length #860

joshtriplett · 2020-08-12T08:23:47Z

joshtriplett
Aug 12, 2020

Some file formats, such as the git packfile format, have a series of DEFLATE-compressed streams within them, and don't include lengths that would allow skipping over them to the next piece of data. Git's index file format provides those offsets, but when creating that index file, you have to walk over the entire pack file, deflating each object sequentially, in order to find where the next object starts. That's a major bottleneck in some common git operations.

Would it be possible to have a function in zlib-ng that would do the fastest possible inflate of a DEFLATE-compressed stream without actually reconstructing the decompressed data? This could just decode the block headers, decode any dynamic Huffman tables, skip over literal symbols and match/distance pairs, and look for the end-of-block symbol. No decompressed data, no window of data for backreferences, no copy operations, just looking for the end of each block as fast as possible.

If the resulting function ended up being substantially faster, would that be a reasonable thing to add?

mtl1979 · 2020-08-12T09:15:02Z

mtl1979
Aug 12, 2020

It would still need to partially reconstruct the decompressed data to make sure no-one tries to pass in corrupted or invalid data, but copying of decompressed data to user-supplied output buffer could be skipped to improve the speed.

0 replies

nmoinvaz · 2020-08-12T16:12:52Z

nmoinvaz
Aug 12, 2020
Collaborator

It seems like that could already be accomplished with inflateBack with user specified write callback.

0 replies

mtl1979 · 2020-08-12T16:17:02Z

mtl1979
Aug 12, 2020

Calling a callback function would probably be even slower than using normal optimized memcpy(). This is because callback functions need own stack frame.

0 replies

nmoinvaz · 2020-08-12T16:28:11Z

nmoinvaz
Aug 12, 2020
Collaborator

We can check to see if the callback is NULL and if it is then not attempt to call it.

0 replies

mtl1979 · 2020-08-12T16:33:22Z

mtl1979
Aug 12, 2020

@nmoinvaz That would actually make it slightly faster when callback is NULL even though it introduces conditional jump. I just don't know how big speed penalty it would incur if the callback is non-NULL.

0 replies

nmoinvaz · 2020-08-12T16:59:12Z

nmoinvaz
Aug 12, 2020
Collaborator

Yeah it is definitely a trade off. It ends up making things a bit faster for one scenario and a bit slower for another. Either way I think @joshtriplett should be able to use inflateBack in its current state and he can profile it to see if checking for out is NULL is that much better for his scenario.

0 replies

joshtriplett · 2020-08-12T17:00:56Z

joshtriplett
Aug 12, 2020
Author

@mtl1979 wrote:

It would still need to partially reconstruct the decompressed data to make sure no-one tries to pass in corrupted or invalid data, but copying of decompressed data to user-supplied output buffer could be skipped to improve the speed.

That's not quite what I'm looking for. If I need to validate the data, I can do a full inflate; the optimization of avoiding the final copy might be nice, but generally if I'm validating the data I'll also want to do something else with it, such as hash it, so I need the data anyway.

In this case, I really do just want to know the length of the compressed data stream, so that I can skip past it and start processing the next record. I'd like to avoid not just the overhead of the final copy, but also the overhead of the internal copies to process backreferences.

If it'd be trivial to do, reconstructing the match-length and summing up the uncompressed length would be nice. Beyond that, I'd rather skip any checks for the validity of the deflate stream, if they'd add any overhead at all.

0 replies

mtl1979 · 2020-08-12T17:02:01Z

mtl1979
Aug 12, 2020

@nmoinvaz I had a quick glimpse at inflateBack() and no-op'ing the output callback would require more than just checking that output callback is non-NULL as the code blindly writes to output buffer until it's full...

0 replies

mtl1979 · 2020-08-12T17:07:12Z

mtl1979
Aug 12, 2020

@joshtriplett I think easiest way would be to duplicate inflateBack() and strip anything that writes data to output window. Some lines would need to be modified to make sure input pointer is advanced even if copying and advancing output pointer is removed.

I'm not sure what is the best way to test the modified code actually works as expected except linking the code against git...

0 replies

joshtriplett · 2020-08-12T17:22:22Z

joshtriplett
Aug 12, 2020
Author

@mtl1979 I'd be happy to give that a try and benchmark it. (I'm sure there's a more maintainable way to avoid the code duplication, but it'll help to have numbers for how much this could help first.)

0 replies

mtl1979 · 2020-08-12T17:25:10Z

mtl1979
Aug 12, 2020

@joshtriplett We have used various tricks to avoid code duplication elsewhere in the code, but for initial testing, it is easier to just duplicate the code... When there is just two alternatives, it doesn't make sense to use macros to hide the differences especially when the code already uses a lot of macros....

0 replies

joshtriplett · 2020-08-14T04:26:15Z

joshtriplett
Aug 14, 2020
Author

Some preliminary results using a large (914MB compressed, 2592MB uncompressed) DEFLATE stream:

About 8.3s using inflate.
About 8.9s using inflateBack with a no-op in and out callback.
- Changing inflateBack to check for a NULL callback seemed to add a few milliseconds to the timing, whether passing NULL or a no-op. Looks like inflateBack isn't the right starting point.
About 7.4s using a version of inflate that checks a flag (state->skipout, cached in a local variable) to see if it should skip output.
About 6.4s using a version of inflate if that flag is hardcoded (setting the local to 1 unconditionally), or if the output code is deleted entirely (no performance difference, compilers are smart enough). (Hardcoding it only in inflate_fast gives part but not all of that speedup.)

Not as much as I'd expected, but still fast enough to be worthwhile, and it seems quite likely that it could be optimized further.

Here's the WIP patch/hack: zlib-ng-skipout.patch.txt
To test it, you need a raw deflate stream (not zlib or gzip). Call inflateInit2 with windowbits set to -256-15.
It's currently in "hardcoded skipout" mode; search for the FIXME comment to find that.

Thoughts?

Also, in the course of working on this, I found an unrelated optimization, which I'll send a PR for.

0 replies

nmoinvaz · 2020-08-14T06:35:33Z

nmoinvaz
Aug 14, 2020
Collaborator

I am noticing that in inflateBack that there might be some continual checks that call ROOM. And it may still copy bytes to s->window using the put variable. So s->window is being used as internal buffer before writing to out it seems. Removing that might give significant gains.

Also I am wondering what is the size of the input buffer given to inflateBack? If you increase it do you notice any speed improvement?

Nice work on the PR. I will try and do some performance testing on it soon.

0 replies

joshtriplett · 2020-08-14T06:52:10Z

joshtriplett
Aug 14, 2020
Author

@nmoinvaz I gave inflateBack all of the input data at once, and made the input callback a no-op.

0 replies

nmoinvaz · 2020-08-15T23:58:53Z

nmoinvaz
Aug 15, 2020
Collaborator

@joshtriplett I have done some benchmarking on the PR. Do you plan on trying to take out the copying of data to s->window and the checks to ROOM? Perhaps the modifications might be better suited for a new function like inflateSize.

0 replies

joshtriplett · 2020-08-16T17:35:18Z

joshtriplett
Aug 16, 2020
Author

@nmoinvaz I didn't touch the inflateBack function, only inflate, so there are no calls to ROOM. I removed all of the copying of data to s->window, and as far as I can tell, the window should never get allocated at all.

0 replies

nmoinvaz · 2020-09-14T05:22:05Z

nmoinvaz
Sep 14, 2020
Collaborator

I ran across this in the FAQ.zlib which may be related/useful:

Can I access data randomly in a compressed stream?

No, not without some preparation. If when compressing you periodically use
Z_FULL_FLUSH, carefully write all the pending data at those points, and
keep an index of those locations, then you can start decompression at those
points. You have to be careful to not use Z_FULL_FLUSH too often, since it
can significantly degrade compression...

... Alternatively, you can scan a deflate stream once to generate an index, and then use that index for
random access. See examples/zran.c .

https://github.com/madler/zlib/blob/master/examples/zran.c

0 replies

Byron · 2020-12-06T16:34:15Z

Byron
Dec 6, 2020

I repeated performance tests related to deflate decompression performance on Apple Silicon (MacBook Air) using the system provided git executable for comparison. It appears it got even faster in scanning through tiny deflate streams, leaving gitoxide in the dust.

See this comment for more details.

To my mind the efficient handling of decompressing small streams is vital to further improving on gitoxides performance both when streaming a pack file as well as when decoding objects. The latter at least is quite perfectly parallelised which helps offsetting some cost.

It is surprising to me how git can be this fast given that it appears to only use zlib without any special handling.

Edit: This morning I woke up realising that gitoxide doesn't even use zlib-ng yet, so doing that alone might already significantly speed it up compared to its current driver: miniz-oxide.

0 replies

Uh oh!

Feature request: fast inflate that discards data, to find length #860

Uh oh!

Uh oh!

Replies: 18 comments

Uh oh!

Uh oh!

nmoinvaz Aug 12, 2020 Collaborator

Uh oh!

Uh oh!

nmoinvaz Aug 12, 2020 Collaborator

Uh oh!

Uh oh!

nmoinvaz Aug 12, 2020 Collaborator

Uh oh!

joshtriplett Aug 12, 2020 Author

Uh oh!

Uh oh!

Uh oh!

joshtriplett Aug 12, 2020 Author

Uh oh!

Uh oh!

joshtriplett Aug 14, 2020 Author

Uh oh!

Uh oh!

nmoinvaz Aug 14, 2020 Collaborator

Uh oh!

joshtriplett Aug 14, 2020 Author

Uh oh!

Uh oh!

nmoinvaz Aug 15, 2020 Collaborator

Uh oh!

joshtriplett Aug 16, 2020 Author

Uh oh!

nmoinvaz Sep 14, 2020 Collaborator

Uh oh!

Uh oh!

nmoinvaz
Aug 12, 2020
Collaborator

nmoinvaz
Aug 12, 2020
Collaborator

nmoinvaz
Aug 12, 2020
Collaborator

joshtriplett
Aug 12, 2020
Author

joshtriplett
Aug 12, 2020
Author

joshtriplett
Aug 14, 2020
Author

nmoinvaz
Aug 14, 2020
Collaborator

joshtriplett
Aug 14, 2020
Author

nmoinvaz
Aug 15, 2020
Collaborator

joshtriplett
Aug 16, 2020
Author

nmoinvaz
Sep 14, 2020
Collaborator