Abstract
While raw images possess distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels, they are not widely adopted by general users due to their substantial storage requirements. Very recent studies propose to compress raw images by designing sampling masks within the pixel space of the raw image. However, these approaches often leave space for pursuing more effective image representations and compact metadata. In this work, we propose a novel framework that learns a compact representation in the latent space, serving as metadata, in an end-to-end manner. Compared with lossy image compression, we analyze the intrinsic difference of the raw image reconstruction task caused by rich information from the sRGB image. Based on the analysis, a novel design of the backbone with asymmetric and hybrid spatial feature resolutions is proposed, which significantly improves the rate-distortion performance. Besides, we propose a novel design of the sRGB-guided context model, which can better predict the order masks of encoding/decoding based on both the sRGB image and the the masks of already processed features. Benefited from the better modeling of the correlation between order masks, the already processed information can be better utilized. Moreover, a novel sRGB-guided adaptive quantization precision strategy, which dynamically assigns varying levels of quantization precision to different regions, further enhances the representation ability of the model. Finally, based on the iterative properties of the proposed context model, we propose a novel strategy to achieve variable bit rates using a single model. This strategy allows for the continuous convergence of a wide range of bit rates. We demonstrate how our raw image compression scheme effectively allocates more bits to image regions that hold greater global importance. Extensive experimental results validate the superior performance of the proposed method, achieving high-quality raw image reconstruction with a smaller metadata size, compared with existing SOTA methods.
Similar content being viewed by others
Data availability
This work does not propose a new dataset. All the datasets we used are publicly available.
References
Abdelhamed, A., Brubaker, MA., & Brown, MS. (2019) Noise flow: Noise modeling with conditional normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision. pp 3165–3173.
Agustsson, E., Mentzer, F., Tschannen, M., Cavigelli, L., Timofte, R., Benini, L., & Gool, L. V. (2017). Soft-to-hard vector quantization for end-to-end learning compressible representations. Advances in neural information processing systems. 30.
Ballé, J., Laparra, V., & Simoncelli, E. P. (2016) End-to-end optimized image compression. arXiv preprint arXiv:1611.01704.
Ballé, J., Minnen, D., Singh, S., Hwang, SJ., & Johnston, N. (2018) Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436.
Bychkovsky, V., Paris, S., Chan, E., & Durand, F. (2011) Learning photographic global tonal adjustment with a database of input / output image pairs. In The twenty-fourth IEEE conference on computer vision and pattern recognition.
Chakrabarti, A., Xiong, Y., Sun, B., Darrell, T., Scharstein, D., Zickler, T., et al. (2014). Modeling radiometric uncertainty for vision with tone-mapped color images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11), 2185–2198.
Cheng, Z., Sun, H., Takeuchi, M., & Katto, J. (2020). Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7939–7948.
Cheng, D., Prasad, D. K., & Brown, M. S. (2014). Illuminant estimation for color constancy: Why spatial-domain methods work and the role of the color distribution. JOSA A, 31(5), 1049–1058.
Choi, Y., El-Khamy, M., & Lee, J. (2019). Variable rate deep image compression with a conditional autoencoder. In Proceedings of the IEEE/CVF international conference on computer vision.
Debevec, P. E., & Malik, J. (2008). Recovering high dynamic range radiance maps from photographs. In ACM SIGGRAPH 2008 classes. pp 1–10.
Gong, H., Finlayson, G. D., Darrodi, M. M., & Fisher, R. B. (2018). Rank-based radiometric calibration. In Color and Imaging Conference. vol. 2018. Society for Imaging Science and Technology. pp 59–66.
He, D., Zheng, Y., Sun, B., Wang, Y., & Qin, H. (2021). Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 2021. pp 14771–14780.
Helminger, L., Djelouah, A., Gross, M., & Schroers, C. (2020). Lossy image compression with normalizing flows. arXiv preprint arXiv:2008.10486.
Holub, V., & Fridrich, J. (2012) Designing steganographic distortion using directional filters. In IEEE International workshop on information forensics and security (WIFS). IEEE,2012, 234–239.
Holub, V., Fridrich, J., & Denemark, T. (2014). Universal distortion function for steganography in an arbitrary domain. EURASIP Journal on Information Security, 2014, 1–13.
Hu, Y., Yang, W., Ma, Z., & Liu, J. (2021). Learning end-to-end lossy image compression: A benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8), 4194–4211.
Huang, H., Yang, W., Hu, Y., Liu, J., & Duan, L. Y. (2022). Towards low light enhancement with raw images. IEEE Transactions on Image Processing, 31, 1391–1405.
Hussain, M., Wahab, A. W. A., Idris, Y. I. B., Ho, A. T., & Jung, K. H. (2018). Image steganography in spatial domain: A survey. Signal Processing: Image Communication, 65, 46–66.
Jang, E., Gu, S., & Poole, B. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.
Kim, S. J., Lin, H. T., Lu, Z., Süsstrunk, S., Lin, S., & Brown, M. S. (2012). A new in-camera imaging model for color computer vision and its application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12), 2289–2302.
Lee, J., Cho, S., & Beack, SK. (2018). Context-adaptive entropy model for end-to-end optimized image compression. arXiv preprint arXiv:1809.10452.
Li, M., Zuo, W., Gu, S., Zhao, D., & Zhang, D. (2018) Learning convolutional networks for content-weighted image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. pp 3214–3223.
Liu, Y. L., Lai, W. S., Chen, Y. S., Kao, Y. L., Yang, M. H., Chuang, Y. Y. & Huang, J. B. (2020). Single-image HDR reconstruction by learning to reverse the camera pipeline. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 1651–1660.
Mandal, P. C., Mukherjee, I., Paul, G., & Chatterji, B. N. (2022). Digital image steganography: A literature survey. Information Sciences, 609, 1451–1488.
Marnerides, D., Bashford-Rogers, T., Hatchett, J., & Debattista, K. (2018). A new in-camera imaging model for color computer vision and its application. Computer Graphics Forum, 37, 37–49.
Mentzer, F., Toderici, G. D., Tschannen, M., & Agustsson, E. (2020). High-fidelity generative image compression. Advances in Neural Information Processing Systems, 33, 11913–11924.
Minnen, D., & Singh S. (2020). Channel-wise autoregressive entropy models for learned image compression. In 2020 IEEE international conference on image processing (ICIP). IEEE. pp 3339–3343.
Minnen, D., Ballé, J., & Toderici, G. D. (2018). Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems. 31.
Morkel, T., Eloff, J. H., Olivier, M. S. (2005). An overview of image steganography. In ISSA. vol. 1; pp 1–11.
Nam, S., Punnappurath, A., Brubaker, MA., & Brown, M. S. (2022). Learning sRGB-to-raw-RGB de-rendering with content-aware metadata. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 17704–17713.
Nguyen, R. M., & Brown, M. S. (2016). Raw image reconstruction using a self-contained SRGB-jpeg image with only 64 kb overhead. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1655–1663.
Nguyen, R. M., & Brown, M. S. (2018). Raw image reconstruction using a self-contained sRGB-JPEG image with small memory overhead. International Journal of Computer Vision, 126(6), 637–650.
Pevnỳ, T., Filler, T., & Bas, P. (2010). Using high-dimensional image models to perform highly undetectable steganography. In Information Hiding: 12th International Conference, IH 2010, Calgary, AB, Canada, June 28-30, 2010, Revised Selected Papers 12. Springer; p. 161–177.
Punnappurath, A., & Brown, M. S. (2021). Spatially aware metadata for raw reconstruction. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 218–226.
Punnappurath, A., & Brown, M. S. (2019). Learning raw image reconstruction-aware deep image compressors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 1013–1019.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
Song, M., Choi, J., & Han, B. (2021). Variable-rate deep image compression through spatially-adaptive feature transform. In Proceedings of the IEEE/CVF international conference on computer vision. pp 2380–2389.
Theis, L., Shi, W., Cunningham, A., & Huszár, F. (2017). Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395.
Wang, Y., Wan, R., Yang, W., Li, H., Chau, LP., & Kot, A. (2022). Low-light image enhancement with normalizing flow. In Proceedings of the AAAI conference on artificial intelligence. vol. 36; pp 2604–2612.
Wang, Y., Yu, Y., Yang, W., Guo, L., Chau, L. P., Kot, A. C., & Wen, B. (2023). Raw image reconstruction with learned compact metadata. arXiv preprint arXiv:2302.12995.
Wang, L., & Yoon, K. J. (2021). Deep learning for hdr imaging: State-of-the-art and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 8874–8895.
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Wei, K., Fu, Y., Yang, J., & Huang, H. (2020). A physics-based noise formation model for extreme low-light raw denoising. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2758–2767.
Xing, Y., Qian, Z., & Chen, Q. (2021) Invertible image signal processing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 6287–6296.
Yang, F., Sun, Q., Jin, H., & Zhou, Z. (2020). Superpixel segmentation with fully convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 13964–13973.
Yuan, L., & Sun, J. (2011). High quality image reconstruction from RAW and JPEG image pair. In 2011 international conference on computer vision. IEEE; 2011. pp 2158–2165.
Zhang, Y., Qin, H., Wang, X., & Li, H. (2021) Rethinking noise synthesis and modeling in raw denoising. In Proceedings of the IEEE/CVF international conference on computer vision. pp 4593–4601.
Zheng, Z., Ren, W., Cao, X., Wang, T., & Jia, X. (2021). Ultra-high-definition image hdr reconstruction via collaborative bilateral learning. In Proceedings of the IEEE/CVF international conference on computer vision. pp 4449–4458.
Funding
This work was done at Rapid-Rich Object Search (ROSE) Lab, Nanyang Technological University. This research is supported in part by the NTU-PKU Joint Research Institute (a collaboration between the Nanyang Technological University and Peking University that is sponsored by a donation from the Ng Teng Fong Charitable Foundation), the Basic and Frontier Research Project of PCL, the Major Key Project of PCL, and the MOE AcRF Tier 1 (RG61/22) and Start-Up Grant.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Code availability
The code of this work is released at https://github.com/wyf0912/R2LCM.
Additional information
Communicated by Seon Joo Kim.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Yu, Y., Yang, W. et al. Beyond Learned Metadata-Based Raw Image Reconstruction. Int J Comput Vis 132, 5514–5533 (2024). https://doi.org/10.1007/s11263-024-02143-2
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02143-2