Abstract
In this paper, we address the problem of spatially-varying illumination-aware indoor harmonization. Existing image harmonization works either focus on extracting no more than 2D information (e.g., low-level statistics or image filters) from the background image or rely on the non-linear representations of deep neural networks to adjust the foreground appearance. However, from a physical point of view, realistic image harmonization requires the perception of illumination at the foreground position in the scene (i.e., Spatially-Varying (SV) illumination), especially for indoor scenes. To solve indoor harmonization, we present a novel learning-based framework, which attempts to mimic the physical model of image formation. The proposed framework consists of a new neural harmonization architecture with four compact neural modules, which jointly learn SV illumination, shading, albedo, and rendering. In particular, a multilayer perceptron-based neural illumination field is designed to recover the illumination with finer details. Besides, we construct the first large-scale synthetic indoor harmonization benchmark dataset in which the foreground focuses on humans and is rendered and perturbed by SV illuminations. An object placement formula is also derived to ensure that the foreground object is placed in the background at a reasonable size. Extensive experiments on synthetic and real data demonstrate that our proposed approach achieves better results than prior works.
Similar content being viewed by others
Data Availability
Both our large-scale indoor harmonization benchmark dataset and real evaluation dataset are available at https://github.com/waldenlakes/IndoorHarmony-Dataset. The raw data used to construct our datasets are available from the Laval Indoor HDR dataset (Gardner et al., 2017), the Replica dataset (Straub et al., 2019), Poly Haven (Poly Haven, 2021), HDR MAPS (HDR MAPS, 2020), 3D People (3D People, 2022), and Taobao (Taobao, 2018).
References
3D People. (2022). https://3dpeople.com
Ashikmin, M., Premože, S., & Shirley, P. (2000). A microfacet-based BRDF generator. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, (pp. 65–74).
Bao, Z., Long, C., Fu, G., Liu, D., Li, Y., Wu, J., & Xiao, C. (2022). Deep image-based illumination harmonization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 18542–18551).
Bolduc, C., Giroux, J., Hébert, M., Demers, C., & Lalonde, J.-F. (2023). Beyond the pixel: a photometrically calibrated hdr dataset for luminance and color prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), (pp. 8071–8081).
Cao, J., Cong, W., Niu, L., Zhang, J., & Zhang, L. (2022). Deep image harmonization by bridging the reality gap. BMVC
Catmull, E.E. (1974). A subdivision algorithm for computer display of curved surfaces. PhD thesis
Chen, W., Wang, W., Yang, W., & Liu, J. (2018). Deep retinex decomposition for low-light enhancement. In: British Machine Vision Conference
Cignoni, P., Callieri, M., Corsini, M., Dellepiane, M., Ganovelli, F., Ranzuglia, G., et al. (2008). Meshlab: an open-source mesh processing tool. In: Eurographics Italian Chapter Conference, vol. 2008, (pp. 129–136). Salerno, Italy
Cohen-Or, D., Sorkine, O., Gal, R., Leyvand, T., & Xu, Y.-Q. (2006). Color harmonization. In: ACM SIGGRAPH 2006 Papers, (pp. 624–630).
Community, B.O. (2018). Blender - a 3D Modelling and Rendering Package. Blender Foundation, Stichting Blender Foundation, Amsterdam. Blender Foundation. http://www.blender.org
Cong, W., Niu, L., Zhang, J., Liang, J., & Zhang, L. (2021). Bargainnet: Background-guided domain translation for image harmonization. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), IEEE,( pp. 1–6).
Cong, W., Tao, X., Niu, L., Liang, J., Gao, X., Sun, Q., & Zhang, L. (2022). High-resolution image harmonization via collaborative dual transformations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 18470–18479).
Cong, W., Zhang, J., Niu, L., Liu, L., Ling, Z., Li, W., & Zhang, L. (2020). DoveNet: Deep image harmonization via domain verification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Cun, X., & Pun, C.-M. (2020). Improving the harmony of the composite image by spatial-separated attention module. IEEE Transactions on Image Processing, 29, 4759–4771.
Das, P., Karaoglu, S., & Gevers, T. (2022). Pie-net: Photometric invariant edge guided network for intrinsic image decomposition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 19790–19799).
Debevec, P., Hawkins, T., Tchou, C., Duiker, H.-P., Sarokin, W., & Sagar, M. (2000). Acquiring the reflectance field of a human face. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, (pp. 145–156).
El Helou, M., Zhou, R., Süsstrunk, S., Timofte, R., Afifi, M., Brown, M. S., Xu, K., Cai, H., Liu, Y., Wang, L.-W., Liu, Z.-S., Li, C.-T., Dipta Das, S., Shah, N. A., Jassal, A., Zhao, T., Zhao, S., Nathan, S., Beham, M. P., & Cheng, J. (2020). Aim 2020: Scene relighting and illumination estimation challenge. In A. Bartoli & A. Fusiello (Eds.), Computer Vision - ECCV 2020 Workshops (pp. 499–518). Springer.
Fan, Q., Yang, J., Hua, G., Chen, B., & Wipf, D. (2018). Revisiting deep intrinsic image decompositions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 8944–8952).
Gardner, M.-A., Hold-Geoffroy, Y., Sunkavalli, K., Gagné, C., & Lalonde, J.-F. (2019). Deep parametric indoor lighting estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 7175–7183).
Gardner, M.-A., Sunkavalli, K., Yumer, E., Shen, X., Gambaretto, E., Gagné, C., & Lalonde, J.-F. (2017). Learning to predict indoor illumination from a single image. ACM Transactions on Graphics, 36(6), 1–14.
Garon, M., Sunkavalli, K., Hadap, S., Carr, N., & Lalonde, J.-F. (2019). Fast spatially-varying indoor lighting estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 6908–6917).
Guerreiro, J.J.A., Nakazawa, M., & Stenger, B. (2023). Pct-net: Full resolution image harmonization using pixel-wise color transformations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 5917–5926).
Guo, Z., Guo, D., Zheng, H., Gu, Z., Zheng, B., & Dong, J. (2021). Image harmonization with transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 14870–14879).
Guo, Z., Zheng, H., Jiang, Y., Gu, Z., & Zheng, B. (2021). Intrinsic image harmonization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 16367–16376).
Guo, Z., Gu, Z., Zheng, B., Dong, J., & Zheng, H. (2022). Transformer for image harmonization and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 12960–12977.
Hang, Y., Xia, B., Yang, W., & Liao, Q. (2022). Scs-co: Self-consistent style contrastive learning for image harmonization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 19710–19719).
Hao, G., Iizuka, S., & Fukui, K. (2020). Image harmonization with attention-based deep feature modulation. In: BMVC
HDR MAPS. (2020). https://hdrmaps.com/
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, (pp. 1026–1034).
Hu, Z., Nsampi, N. E., Wang, X., & Wang, Q. (2022). PNRNet: Physically-inspired neural rendering for any-to-any relighting. IEEE Transactions on Image Processing, 31, 3935–3948.
Jiang, Y., Zhang, H., Zhang, J., Wang, Y., Lin, Z., Sunkavalli, K., Chen, S., Amirghodsi, S., Kong, S., & Wang, Z. (2021). Ssh: A self-supervised framework for image harmonization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 4832–4841).
Jia, J., Sun, J., Tang, C.-K., & Shum, H.-Y. (2006). Drag-and-drop pasting. ACM Transactions on Graphics (SIGGRAPH), 25, 631–637.
Kajiya, J.T. (1986). The rendering equation. In: Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques, (pp. 143–150).
Kanamori, Y., & Endo, Y. (2018). Relighting humans: Occlusion-aware inverse rendering for full-body human images. ACM Transactions on Graphics, 37(6), 1–11.
Ke, Z., Sun, C., Zhu, L., Xu, K., & Lau, R.W. (2022). Harmonizer: Learning to perform white-box image and video harmonization. In: European Conference on Computer Vision. Springer
Labelme. (2016). https://github.com/wkentaro/labelme
Lagunas, M., Sun, X., Yang, J., Villegas, R., Zhang, J., Shu, Z., Masia, B., Gutierrez, D.: Single-image full-body human relighting. arXiv preprint arXiv:2107.07259 (2021)
Lalonde, J.-F., & Efros, A.A. (2007). Using color compatibility for assessing image realism. In: 2007 IEEE 11th International Conference on Computer Vision, IEEE, (pp. 1–8).
LeGendre, C., Ma, W.-C., Fyffe, G., Flynn, J., Charbonnel, L., Busch, J., & Debevec, P. (2019). Deeplight: Learning illumination for unconstrained mobile mixed reality. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 5918–5928).
Li, Z., & Snavely, N. (2018). Learning intrinsic image decomposition from watching the world. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 9039–9048).
Li, Z., Shafiei, M., Ramamoorthi, R., Sunkavalli, K., & Chandraker, M. (2020). Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and SVBRDF from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 2475–2484).
Li, Z., Shi, J., Bi, S., Zhu, R., Sunkavalli, K., Hašan, M., Xu, Z., Ramamoorthi, R., & Chandraker, M. (2022). Physically-based editing of indoor scene lighting from a single image. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, Oct 23–27, 2022, Proceedings, Part VI, (pp. 555–572). Springer
Liang, J., Cun, X., & Pun, C.-M. (2022). Spatial-separated curve rendering network for efficient and high-resolution image harmonization. In: European Conference on Computer Vision. Springer
Ling, J., Xue, H., Song, L., Xie, R., & Gu, X. (2021). Region-aware adaptive instance normalization for image harmonization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 9361–9370).
Liu, S., Huynh, C.P., Chen, C., Arap, M., & Hamid, R. (2023). Lemart: Label-efficient masked region transform for image harmonization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 18290–18299).
Liu, Y., Li, Y., You, S., & Lu, F. (2020). Unsupervised learning for intrinsic image decomposition from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 3248–3257).
Meka, A., Haene, C., Pandey, R., Zollhöfer, M., Fanello, S., Fyffe, G., Kowdle, A., Yu, X., Busch, J., Dourgarian, J., et al. (2019). Deep reflectance fields: High-quality facial reflectance field inference from color gradient illumination. ACM Transactions on Graphics, 38(4), 1–12.
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., & Ng, R. (2020). Nerf: Representing scenes as neural radiance fields for view synthesis. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, Aug 23–28, 2020, Proceedings, Part I 16, (pp. 405–421). Springer
Narihira, T., Maire, M., & Yu, S.X. (2015). Direct intrinsics: Learning albedo-shading decomposition by convolutional regression. In: Proceedings of the IEEE International Conference on Computer Vision, (pp. 2992–2992).
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32
Pérez, P., Gangnet, M., & Blake, A. (2003). Poisson image editing. In: ACM SIGGRAPH 2003 Papers, (pp. 313–318).
Pitie, F., Kokaram, A.C., & Dahyot, R. (2005). N-dimensional probability density function transfer and its application to color transfer. In: Tenth IEEE International Conference on Computer Vision (ICCV’05) Vol 1,2, IEEE, (pp. 1434–1439).
Pitié, F., Kokaram, A. C., & Dahyot, R. (2007). Automated colour grading using colour distribution transfer. Computer Vision and Image Understanding, 107(1–2), 123–137.
Poly Haven. (2021). https://polyhaven.com/hdris
Reinhard, E., Adhikhmin, M., Gooch, B., & Shirley, P. (2001). Color transfer between images. IEEE Computer Graphics and Applications, 21(5), 34–41.
Ren, X., & Liu, Y. (2022). Semantic-guided multi-mask image harmonization. In: European Conference on Computer Vision. Springer
Sato, S., Yao, Y., Yoshida, T., Kaneko, T., Ando, S., & Shimamura, J. (2023). Unsupervised intrinsic image decomposition with lidar intensity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 13466–13475).
Sofiiuk, K., Popenova, P., & Konushin, A. (2021). Foreground-aware semantic representations for image harmonization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, (pp. 1620–1629).
Somanath, G., & Kurz, D. (2021). Hdr environment map estimation for real-time augmented reality. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 11298–11306).
Srinivasan, P.P., Mildenhall, B., Tancik, M., Barron, J.T., Tucker, R., & Snavely, N. (2020). Lighthouse: Predicting lighting volumes for spatially-coherent illumination. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 8080–8089).
Straub, J., Whelan, T., Ma, L., Chen, Y., Wijmans, E., Green, S., Engel, J.J., Mur-Artal, R., Ren, C., Verma, S., Clarkson, A., Yan, M., Budge, B., Yan, Y., Pan, X., Yon, J., Zou, Y., Leon, K., Carter, N., Briales, J., Gillingham, T., Mueggler, E., Pesqueira, L., Savva, M., Batra, D., Strasdat, H.M., Nardi, R.D., Goesele, M., Lovegrove, S., & Newcombe, R. (2019). The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797
Sun, T., Barron, J. T., Tsai, Y.-T., Xu, Z., Yu, X., Fyffe, G., Rhemann, C., Busch, J., Debevec, P. E., & Ramamoorthi, R. (2019). Single image portrait relighting. ACM Transactions on Graphics, 38(4), 79–1.
Sunkavalli, K., Johnson, M. K., Matusik, W., & Pfister, H. (2010). Multi-scale image harmonization. ACM Transactions on Graphics, 29(4), 1–10.
Szot, A., Clegg, A., Undersander, E., Wijmans, E., Zhao, Y., Turner, J., Maestre, N., Mukadam, M., Chaplot, D., Maksymets, O., Gokaslan, A., Vondrus, V., Dharur, S., Meier, F., Galuba, W., Chang, A., Kira, Z., Koltun, V., Malik, J., Savva, M., & Batra, D. (2021). Habitat 2.0: Training home assistants to rearrange their habitat. In: Advances in Neural Information Processing Systems (NeurIPS)
Tancik, M., Srinivasan, P., Mildenhall, B., Fridovich-Keil, S., Raghavan, N., Singhal, U., Ramamoorthi, R., Barron, J., & Ng, R. (2020). Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33, 7537–7547.
Tang, J., Zhu, Y., Wang, H., Chan, J.H., Li, S., & Shi, B. (2022). Estimating spatially-varying lighting in urban scenes with disentangled representation. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, Oct 23–27, 2022, Proceedings, Part VI, (pp. 454–469). Springer
Taobao. (2018). https://www.taobao.com/
Tsai, Y.-H., Shen, X., Lin, Z., Sunkavalli, K., Lu, X., & Yang, M.-H. (2017). Deep image harmonization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3789–3797).
Valanarasu, J.M.J., Zhang, H., Zhang, J., Wang, Y., Lin, Z., Echevarria, J., Ma, Y., Wei, Z., Sunkavalli, K., & Patel, V.M. (2023). Interactive portrait harmonization. In: International Conference on Learning Representations
Wang, K., Gharbi, M., Zhang, H., Xia, Z., & Shechtman, E. (2023). Semi-supervised parametric real-world image harmonization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 5927–5936).
Wang, Z., Philion, J., Fidler, S., & Kautz, J. (2021). Learning indoor inverse rendering with 3d spatially-varying lighting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 12538–12547).
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Wang, Z., Yu, X., Lu, M., Wang, Q., Qian, C., & Xu, F. (2020). Single image portrait relighting via explicit multiple reflectance channel modeling. ACM Transactions on Graphics, 39(6), 1–13.
Weber, H., Garon, M., & Lalonde, J.-F. (2022). Editable indoor lighting estimation. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, Oct 23–27, 2022, Proceedings, Part VI, (pp. 677–692). Springer
Xie, Y., Takikawa, T., Saito, S., Litany, O., Yan, S., Khan, N., Tombari, F., Tompkin, J., Sitzmann, V., & Sridhar, S. (2022). Neural fields in visual computing and beyond. In: Computer Graphics Forum, vol. 41, pp. 641–676. Wiley Online Library
Xu, K., Hancke, G.P., & Lau, R.W.H. (2023). Learning image harmonization in the linear color space. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), (pp. 12570–12579).
Xue, B., Ran, S., Chen, Q., Jia, R., Zhao, B., & Tang, X. (2022). DCCF: Deep comprehensible color filter learning framework for high-resolution image harmonization. In: European Conference on Computer Vision. Springer
Xue, S., Agarwala, A., Dorsey, J., & Rushmeier, H. (2012). Understanding and improving the realism of image composites. ACM Transactions on Graphics, 31(4), 1–10.
Xu, Z., Sunkavalli, K., Hadap, S., & Ramamoorthi, R. (2018). Deep image-based relighting from optimal sparse samples. ACM Transactions on Graphics, 37(4), 1–13.
Zhan, F., Zhang, C., Hu, W., Lu, S., Ma, F., Xie, X., & Shao, L. (2021). Sparse needlets for lighting estimation with spherical transport loss. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 12830–12839).
Zhang, J., & Lalonde, J.-F. (2017). Learning high dynamic range from outdoor panoramas. In: Proceedings of the IEEE International Conference on Computer Vision, (pp. 4519–4528).
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhao, H., Gallo, O., Frosio, I., & Kautz, J. (2016). Loss functions for image restoration with neural networks. IEEE Transactions on Computational Imaging, 3(1), 47–57.
Zhou, H., Hadap, S., Sunkavalli, K., & Jacobs, D.W. (2019). Deep single-image portrait relighting. In: Proceedings of the IEEE International Conference on Computer Vision, (pp. 7194–7202).
Zhou, H., Yu, X., & Jacobs, D.W. (2019). Glosh: Global-local spherical harmonics for intrinsic image decomposition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 7820–7829).
Zhu, Y., Tang, J., Li, S., & Shi, B. (2021). Derendernet: Intrinsic image decomposition of urban scenes with shape-(in) dependent shading rendering. In: 2021 IEEE International Conference on Computational Photography (ICCP), IEEE, (pp. 1–11).
Zhu, Y., Zhang, Y., Li, S., & Shi, B. (2021). Spatially-varying outdoor lighting estimation from intrinsics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12834–12842).
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 62031023.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Rynson W.H. Lau.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, Z., Li, J., Wang, X. et al. Spatially-Varying Illumination-Aware Indoor Harmonization. Int J Comput Vis 132, 2473–2492 (2024). https://doi.org/10.1007/s11263-024-01994-z
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-01994-z