LZB is a tiny byte-aligned implementation of LZ compression algorithm written in C.
LZB is aimed to be used for MCU firmware compression. We use a minimum match length of two bytes to achieve high compression ratio on binary data.
ARMv7 libstdc++.so.6.0.30
Compressor | Time | Ratio |
---|---|---|
lzb -0 | 0.161 s | 49.69 % |
lzb -9 | 7.254 s | 41.66 % |
lz4 -T1 -0 | 0.048 s | 55.53 % |
lz4 -T1 -12 | 1.175 s | 45.68 % |
zip -1 | 0.235 s | 41.00 % |
zip -9 | 1.756 s | 37.55 % |
Compressed data consist of series of blocks with layout byte in leading.
[ layout byte ] [ match or literal bytes ] [ layout byte ] ...
There is two types of block. The mixed block contains 7 matches or literals depending on bitmask value in layout byte.
[ 1xxxxxxx ] [ match if `1` or literal if `0` ] ...
The literal block contains only literal bytes and has up to 127 + 7 bytes depending on last 7 bits of layout byte.
[ 0xxxxxxx ] [ literal bytes up to 134 bytes length ]
The match has four types of encoding from 1 to 4 bytes depending on prefix bits
of leading byte. In the following we note n
is the length field and f
is
the offset field.
[ 0nffffff ]
[ 10nnnnff ] [ ffffffff ]
[ 11nnnnnn ] [ ffffffff ] [ ffffffff ]
[ 11111111 ] [ ffffffff ] [ ffffffff ] [ nnnnnnnn ]
One byte encoding gives 2-3 match length and up to 64 bytes offset. Two byte encoding gives 3-18 match length and up to 1024 bytes offset. Three (and four) byte encoding gives 4-66 (67-322) match length and up to 64k offset.
In the best case compression ratio can reach (4*7+1)/(322*7)
= ~1.29%. In the
worst case we have ratio (134+1)/134
= ~100.75%.
Experimental.