Abstract
Data compression algorithms typically rely on identifying repeated sequences of symbols from the original data to provide a compact representation of the same information, while maintaining the ability to recover the original data from the compressed sequence. Using data transformations prior to the compression process has the potential to enhance the compression capabilities, being lossless as long as the transformation is invertible. Floating point data presents unique challenges to generate invertible transformations with high compression potential. This paper identifies key conditions for basic operations of floating point data that guarantee lossless transformations. Then, we show four methods that make use of these observations to deliver lossless compression of real datasets, where we improve compression rates up to 40%.
This work was supported by the IoTalentum Project within the Framework of Marie Skłodowska-Curie Actions Innovative Training Networks (ITN)-European Training Networks (ETN), which is funded by the European Union Horizon 2020 Research and Innovation Program under Grant 953442.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
IEEE 754-2019 standard for floating-point arithmetic (2019)
Batal, I., Hauskrecht, M.: A supervised time series feature extraction technique using DCT and DWT. In: 2009 International Conference on Machine Learning and Applications (2009)
City of Chicago: Taxi trips dataset (2022). https://tinyurl.com/4rypurjp
Jean-loup Gailly, M.A.: Zlib compressor. https://www.zlib.net/
Kaya, H.: Gas turbine CO and NOx emission dataset (2019). https://tinyurl.com/2ubk63ra
Hurst, A., Lucani, D.E., Assent, I., Zhang, Q.: Glean: generalized-deduplication-enabled approximate edge analytics. IEEE Internet of Things J. 10, 4006–4020 (2023)
Hurst, A., Lucani, D.E., Zhang, Q.: GreedyGD: enhanced generalized deduplication for direct analytics in IoT (2023). https://arxiv.org/abs/2304.07240
Klower, M., Razinger, M., Dominguez, J.J., Duben, P.D., Palmer, T.N.: Compressing atmospheric data into its real information content. Nat. Comput. Sci. 1, 713–724 (2021)
Hill, M.D.: Notes on floating point arithmetic, part of course CS/ECE 354 at university of Wisconsin. https://tinyurl.com/mvzcyc3x
Muller, J.M., et al.: Handbook of Floating-Point Arithmetic (2010)
Taurone, F., Lucani, D.E., Fehér, M., Zhang, Q.: Change a bit to save bytes: compression for floating point time-series data. In: IEEE ICC (2023). https://arxiv.org/abs/2303.04478
Vestergaard, R., Lucani, D.E., Zhang, Q.: A randomly accessible lossless compression scheme for time-series data. In: IEEE Conference on Computer Communications, IEEE INFOCOM 2020 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Taurone, F., Lucani, D.E., Fehér, M., Zhang, Q. (2023). Lossless Preprocessing of Floating Point Data to Enhance Compression. In: Mehmood, R., et al. Distributed Computing and Artificial Intelligence, Special Sessions I, 20th International Conference. DCAI 2023. Lecture Notes in Networks and Systems, vol 741. Springer, Cham. https://doi.org/10.1007/978-3-031-38318-2_45
Download citation
DOI: https://doi.org/10.1007/978-3-031-38318-2_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38317-5
Online ISBN: 978-3-031-38318-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)