Beyond Learned Metadata-Based Raw Image Reconstruction

Wang, Yufei; Yu, Yi; Yang, Wenhan; Guo, Lanqing; Chau, Lap-Pui; Kot, Alex C.; Wen, Bihan

doi:10.1007/s11263-024-02143-2

Beyond Learned Metadata-Based Raw Image Reconstruction

Published: 17 June 2024

Volume 132, pages 5514–5533, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yufei Wang¹,
Yi Yu¹,
Wenhan Yang²,
Lanqing Guo¹,
Lap-Pui Chau³,
Alex C. Kot¹ &
…
Bihan Wen ORCID: orcid.org/0000-0002-6874-6453¹

556 Accesses
Explore all metrics

Abstract

While raw images possess distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels, they are not widely adopted by general users due to their substantial storage requirements. Very recent studies propose to compress raw images by designing sampling masks within the pixel space of the raw image. However, these approaches often leave space for pursuing more effective image representations and compact metadata. In this work, we propose a novel framework that learns a compact representation in the latent space, serving as metadata, in an end-to-end manner. Compared with lossy image compression, we analyze the intrinsic difference of the raw image reconstruction task caused by rich information from the sRGB image. Based on the analysis, a novel design of the backbone with asymmetric and hybrid spatial feature resolutions is proposed, which significantly improves the rate-distortion performance. Besides, we propose a novel design of the sRGB-guided context model, which can better predict the order masks of encoding/decoding based on both the sRGB image and the the masks of already processed features. Benefited from the better modeling of the correlation between order masks, the already processed information can be better utilized. Moreover, a novel sRGB-guided adaptive quantization precision strategy, which dynamically assigns varying levels of quantization precision to different regions, further enhances the representation ability of the model. Finally, based on the iterative properties of the proposed context model, we propose a novel strategy to achieve variable bit rates using a single model. This strategy allows for the continuous convergence of a wide range of bit rates. We demonstrate how our raw image compression scheme effectively allocates more bits to image regions that hold greater global importance. Extensive experimental results validate the superior performance of the proposed method, achieving high-quality raw image reconstruction with a smaller metadata size, compared with existing SOTA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learned Reverse ISP with Soft Supervision

RAW Image Reconstruction Using a Self-contained sRGB–JPEG Image with Small Memory Overhead

Article Open access 18 December 2017

RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

Data availability

This work does not propose a new dataset. All the datasets we used are publicly available.

References

Abdelhamed, A., Brubaker, MA., & Brown, MS. (2019) Noise flow: Noise modeling with conditional normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision. pp 3165–3173.
Agustsson, E., Mentzer, F., Tschannen, M., Cavigelli, L., Timofte, R., Benini, L., & Gool, L. V. (2017). Soft-to-hard vector quantization for end-to-end learning compressible representations. Advances in neural information processing systems. 30.
Ballé, J., Laparra, V., & Simoncelli, E. P. (2016) End-to-end optimized image compression. arXiv preprint arXiv:1611.01704.
Ballé, J., Minnen, D., Singh, S., Hwang, SJ., & Johnston, N. (2018) Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436.
Bychkovsky, V., Paris, S., Chan, E., & Durand, F. (2011) Learning photographic global tonal adjustment with a database of input / output image pairs. In The twenty-fourth IEEE conference on computer vision and pattern recognition.
Chakrabarti, A., Xiong, Y., Sun, B., Darrell, T., Scharstein, D., Zickler, T., et al. (2014). Modeling radiometric uncertainty for vision with tone-mapped color images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11), 2185–2198.
Article Google Scholar
Cheng, Z., Sun, H., Takeuchi, M., & Katto, J. (2020). Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7939–7948.
Cheng, D., Prasad, D. K., & Brown, M. S. (2014). Illuminant estimation for color constancy: Why spatial-domain methods work and the role of the color distribution. JOSA A, 31(5), 1049–1058.
Article Google Scholar
Choi, Y., El-Khamy, M., & Lee, J. (2019). Variable rate deep image compression with a conditional autoencoder. In Proceedings of the IEEE/CVF international conference on computer vision.
Debevec, P. E., & Malik, J. (2008). Recovering high dynamic range radiance maps from photographs. In ACM SIGGRAPH 2008 classes. pp 1–10.
Gong, H., Finlayson, G. D., Darrodi, M. M., & Fisher, R. B. (2018). Rank-based radiometric calibration. In Color and Imaging Conference. vol. 2018. Society for Imaging Science and Technology. pp 59–66.
He, D., Zheng, Y., Sun, B., Wang, Y., & Qin, H. (2021). Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 2021. pp 14771–14780.
Helminger, L., Djelouah, A., Gross, M., & Schroers, C. (2020). Lossy image compression with normalizing flows. arXiv preprint arXiv:2008.10486.
Holub, V., & Fridrich, J. (2012) Designing steganographic distortion using directional filters. In IEEE International workshop on information forensics and security (WIFS). IEEE,2012, 234–239.
Holub, V., Fridrich, J., & Denemark, T. (2014). Universal distortion function for steganography in an arbitrary domain. EURASIP Journal on Information Security, 2014, 1–13.
Article Google Scholar
Hu, Y., Yang, W., Ma, Z., & Liu, J. (2021). Learning end-to-end lossy image compression: A benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8), 4194–4211.
Google Scholar
Huang, H., Yang, W., Hu, Y., Liu, J., & Duan, L. Y. (2022). Towards low light enhancement with raw images. IEEE Transactions on Image Processing, 31, 1391–1405.
Article Google Scholar
Hussain, M., Wahab, A. W. A., Idris, Y. I. B., Ho, A. T., & Jung, K. H. (2018). Image steganography in spatial domain: A survey. Signal Processing: Image Communication, 65, 46–66.
Google Scholar
Jang, E., Gu, S., & Poole, B. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.
Kim, S. J., Lin, H. T., Lu, Z., Süsstrunk, S., Lin, S., & Brown, M. S. (2012). A new in-camera imaging model for color computer vision and its application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12), 2289–2302.
Article Google Scholar
Lee, J., Cho, S., & Beack, SK. (2018). Context-adaptive entropy model for end-to-end optimized image compression. arXiv preprint arXiv:1809.10452.
Li, M., Zuo, W., Gu, S., Zhao, D., & Zhang, D. (2018) Learning convolutional networks for content-weighted image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. pp 3214–3223.
Liu, Y. L., Lai, W. S., Chen, Y. S., Kao, Y. L., Yang, M. H., Chuang, Y. Y. & Huang, J. B. (2020). Single-image HDR reconstruction by learning to reverse the camera pipeline. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 1651–1660.
Mandal, P. C., Mukherjee, I., Paul, G., & Chatterji, B. N. (2022). Digital image steganography: A literature survey. Information Sciences, 609, 1451–1488.
Article Google Scholar
Marnerides, D., Bashford-Rogers, T., Hatchett, J., & Debattista, K. (2018). A new in-camera imaging model for color computer vision and its application. Computer Graphics Forum, 37, 37–49.
Article Google Scholar
Mentzer, F., Toderici, G. D., Tschannen, M., & Agustsson, E. (2020). High-fidelity generative image compression. Advances in Neural Information Processing Systems, 33, 11913–11924.
Google Scholar
Minnen, D., & Singh S. (2020). Channel-wise autoregressive entropy models for learned image compression. In 2020 IEEE international conference on image processing (ICIP). IEEE. pp 3339–3343.
Minnen, D., Ballé, J., & Toderici, G. D. (2018). Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems. 31.
Morkel, T., Eloff, J. H., Olivier, M. S. (2005). An overview of image steganography. In ISSA. vol. 1; pp 1–11.
Nam, S., Punnappurath, A., Brubaker, MA., & Brown, M. S. (2022). Learning sRGB-to-raw-RGB de-rendering with content-aware metadata. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 17704–17713.
Nguyen, R. M., & Brown, M. S. (2016). Raw image reconstruction using a self-contained SRGB-jpeg image with only 64 kb overhead. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1655–1663.
Nguyen, R. M., & Brown, M. S. (2018). Raw image reconstruction using a self-contained sRGB-JPEG image with small memory overhead. International Journal of Computer Vision, 126(6), 637–650.
Article MathSciNet Google Scholar
Pevnỳ, T., Filler, T., & Bas, P. (2010). Using high-dimensional image models to perform highly undetectable steganography. In Information Hiding: 12th International Conference, IH 2010, Calgary, AB, Canada, June 28-30, 2010, Revised Selected Papers 12. Springer; p. 161–177.
Punnappurath, A., & Brown, M. S. (2021). Spatially aware metadata for raw reconstruction. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 218–226.
Punnappurath, A., & Brown, M. S. (2019). Learning raw image reconstruction-aware deep image compressors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 1013–1019.
Article Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
Article Google Scholar
Song, M., Choi, J., & Han, B. (2021). Variable-rate deep image compression through spatially-adaptive feature transform. In Proceedings of the IEEE/CVF international conference on computer vision. pp 2380–2389.
Theis, L., Shi, W., Cunningham, A., & Huszár, F. (2017). Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395.
Wang, Y., Wan, R., Yang, W., Li, H., Chau, LP., & Kot, A. (2022). Low-light image enhancement with normalizing flow. In Proceedings of the AAAI conference on artificial intelligence. vol. 36; pp 2604–2612.
Wang, Y., Yu, Y., Yang, W., Guo, L., Chau, L. P., Kot, A. C., & Wen, B. (2023). Raw image reconstruction with learned compact metadata. arXiv preprint arXiv:2302.12995.
Wang, L., & Yoon, K. J. (2021). Deep learning for hdr imaging: State-of-the-art and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 8874–8895.
Article Google Scholar
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Article Google Scholar
Wei, K., Fu, Y., Yang, J., & Huang, H. (2020). A physics-based noise formation model for extreme low-light raw denoising. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2758–2767.
Xing, Y., Qian, Z., & Chen, Q. (2021) Invertible image signal processing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 6287–6296.
Yang, F., Sun, Q., Jin, H., & Zhou, Z. (2020). Superpixel segmentation with fully convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 13964–13973.
Yuan, L., & Sun, J. (2011). High quality image reconstruction from RAW and JPEG image pair. In 2011 international conference on computer vision. IEEE; 2011. pp 2158–2165.
Zhang, Y., Qin, H., Wang, X., & Li, H. (2021) Rethinking noise synthesis and modeling in raw denoising. In Proceedings of the IEEE/CVF international conference on computer vision. pp 4593–4601.
Zheng, Z., Ren, W., Cao, X., Wang, T., & Jia, X. (2021). Ultra-high-definition image hdr reconstruction via collaborative bilateral learning. In Proceedings of the IEEE/CVF international conference on computer vision. pp 4449–4458.

Download references

Funding

This work was done at Rapid-Rich Object Search (ROSE) Lab, Nanyang Technological University. This research is supported in part by the NTU-PKU Joint Research Institute (a collaboration between the Nanyang Technological University and Peking University that is sponsored by a donation from the Ng Teng Fong Charitable Foundation), the Basic and Frontier Research Project of PCL, the Major Key Project of PCL, and the MOE AcRF Tier 1 (RG61/22) and Start-Up Grant.

Author information

Authors and Affiliations

ROSE Lab, Nanyang Technological University, 50 Nanyang Ave, Singapore, 639798, Singapore
Yufei Wang, Yi Yu, Lanqing Guo, Alex C. Kot & Bihan Wen
PengCheng Laboratory, No. 2, Xingke 1st Street, Shenzhen, 518066, China
Wenhan Yang
Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, 11 Yuk Choi Rd, Hong Kong, China
Lap-Pui Chau

Authors

Yufei Wang
View author publications
Search author on:PubMed Google Scholar
Yi Yu
View author publications
Search author on:PubMed Google Scholar
Wenhan Yang
View author publications
Search author on:PubMed Google Scholar
Lanqing Guo
View author publications
Search author on:PubMed Google Scholar
Lap-Pui Chau
View author publications
Search author on:PubMed Google Scholar
Alex C. Kot
View author publications
Search author on:PubMed Google Scholar
Bihan Wen
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Bihan Wen.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Code availability

The code of this work is released at https://github.com/wyf0912/R2LCM.

Additional information

Communicated by Seon Joo Kim.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Y., Yu, Y., Yang, W. et al. Beyond Learned Metadata-Based Raw Image Reconstruction. Int J Comput Vis 132, 5514–5533 (2024). https://doi.org/10.1007/s11263-024-02143-2

Download citation

Received: 03 June 2023
Accepted: 31 May 2024
Published: 17 June 2024
Version of record: 17 June 2024
Issue date: December 2024
DOI: https://doi.org/10.1007/s11263-024-02143-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond Learned Metadata-Based Raw Image Reconstruction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learned Reverse ISP with Soft Supervision

RAW Image Reconstruction Using a Self-contained sRGB–JPEG Image with Small Memory Overhead

RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now