Abstract
Three problems exist in sequential facial image editing: discontinuous editing, inconsistent editing, and irreversible editing. Discontinuous editing is that the current editing can not retain the previously edited attributes. Inconsistent editing is that swapping the attribute editing orders can not yield the same results. Irreversible editing means that operating on a facial image is irreversible, especially in sequential facial image editing. In this work, we put forward three concepts and their corresponding definitions: editing continuity, consistency, and reversibility. Note that continuity refers to the continuity of attributes, that is, attributes can be continuously edited on any face. Consistency is that not only attributes meet continuity, but also facial identity needs to be consistent. To do so, we propose a novel model to achieve the goal of editing continuity, consistency, and reversibility. Furthermore, a sufficient criterion is defined to determine whether a model is continuous, consistent, and reversible. Extensive qualitative and quantitative experimental results validate our proposed model, and show that a continuous, consistent and reversible editing model has a more flexible editing function while preserving facial identity. We believe that our proposed definitions and model will have wide and promising applications in multimedia processing. Code and data are available at https://github.com/mickoluan/CCR.
Similar content being viewed by others
References
Abdal, R., Qin, Y., & Wonka, P. (2020). Image2StyleGAN++: How to edit the embedded images? In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 8293–8302). https://doi.org/10.1109/CVPR42600.2020.00832
Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)?-Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
Chen, X., Ni, B., Liu, N., Liu, Z., Jiang, Y., Truong, L., & Tian, Q. (2020). Coogan: A memory-efficient framework for high-resolution facial attribute editing. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16 (pp. 670–686). Springer International Publishing
Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, (pp. 8789–8797) https://doi.org/10.1109/CVPR.2018.00916, 1711.09020
De Bot, K., Gommans, P., & Rossing, C. (1991). L1 loss in an L2 environment: Dutch immigrants in France. First Lang attrition (pp. 87–98).
Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). ArcFace: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 4685–4694). https://doi.org/10.1109/CVPR.2019.00482, arXiv:1801.07698
Ding, K., Ma, K., Wang, S., & Simoncelli, E. P. (2020). Image quality assessment: Unifying structure and texture similarity. arXiv preprint arXiv:2004.07728
Geng, Z., Cao, C., & Tulyakov, S. (2020). Towards photo-realistic facial expression manipulation. International Journal of Computer Vision, 128(10), 2744–2761.
Georgopoulos, M., Oldfield, J., Nicolaou, M. A., Panagakis, Y., & Pantic, M. (2021). Mitigating demographic bias in facial datasets with style-based multi-attribute transfer. International Journal of Computer Vision, 129(7), 2288–2307.
Guo, D., Shamai, S., & Verdú, S. (2004). Mutual information and MMSE in Gaussian channels. IEEE International Symposium on Information Theory - Proceedings, 51(4), 347. https://doi.org/10.1109/isit.2004.1365386
He, Z., Zuo, W., Kan, M., Shan, S., & Chen, X. (2019). AttGAN: Facial attribute editing by only changing what you want. IEEE Transactions on Image Processing, 28(11), 5464–5478. https://doi.org/10.1109/TIP.2019.2916751. arXiv:1711.10678.
Hore, A., & Ziou, D. (2010). Image quality metrics: PSNR versus SSIM. In 2010 20th international conference on pattern recognition (pp. 2366–2369) (2010).
Huang, X., Liu, M. Y., Belongie, S., & Kautz, J. (2018). Multimodal unsupervised image-to-image translation. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (pp. 179–196). https://doi.org/10.1007/978-3-030-01219-9_11, arXiv:1804.04732
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv Prepr arXiv:1710.10196
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition 2019 (pp. 4396–4405). https://doi.org/10.1109/CVPR.2019.00453, arXiv:1812.04948
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 8107–8116). https://doi.org/10.1109/CVPR42600.2020.00813, arXiv:1912.04958
Li, X., Zhang, S., Hu, J., Cao, L., Hong, X., Mao, X., Huang, F., Wu, Y., Ji, R. (2021). Image-to-image translation via hierarchical style disentanglement. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 8635–8644). https://doi.org/10.1109/CVPR46437.2021.00853, arXiv:2103.01456
Li, Z., Zhang, Z., Qin, J., Zhang, Z., & Shao, L. (2019). Discriminative fisher embedding dictionary learning algorithm for object recognition. IEEE Transactions on Neural Networks and Learning Systems, 31(3), 786–800.
Liu, M., Ding, Y., Xia, M., Liu, X., Ding, E., Zuo, W., & Wen, S. (2019). STGAN: A unified selective transfer network for arbitrary image attribute editing. Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3668–3677). https://doi.org/10.1109/CVPR.2019.00379, arXiv:1904.09709
Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision (pp. 3730–3738) https://doi.org/10.1109/ICCV.2015.425, arXiv:1411.7766
Mescheder, L., Geiger, A., & Nowozin, S. (2018). Which training methods for GANs do actually converge? In 35th international conference on machine learning, ICML 2018, vol 8. PMLR (pp. 5589–5626). arXiv:1801.04406
Miyato, T., Kataoka, T., Koyama, M., & Yoshida, Y. (2018). Spectral normalization for generative adversarial networks. In 6th international conference on learning representations, ICLR 2018 - conference track proceedingsarXiv:1802.05957
Moore, T., & Fallah, M. (2001). Control of eye movements and spatial attention. Proceedings of the National Academy of Sciences of the United States of America, 98(3), 1273–1276. https://doi.org/10.1073/pnas.98.3.1273
Norouzi, M., Fleet, D. J., & Salakhutdinov, R. (2012). Hamming distance metric learning. Advances in Neural Information Processing Systems 1061–1069
Pumarola, A., Agudo, A., Martinez, A. M., Sanfeliu, A., & Moreno-Noguer, F. (2018). GANimation: Anatomically-aware facial animation from a single image. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (pp. 835–851). https://doi.org/10.1007/978-3-030-01249-6_50, arXiv:1807.09251
Pumarola, A., Agudo, A., Martinez, A. M., Sanfeliu, A., & Moreno-Noguer, F. (2020). Ganimation: One-shot anatomically consistent facial animation. International Journal of Computer Vision, 128(3), 698–713.
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv Prepr arXiv:1511.06434
Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., & Cohen-Or, D. (2021). Encoding in Style: A StyleGAN encoder for image-to-image translation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2287–2296). https://doi.org/10.1109/CVPR46437.2021.00232, arXiv:2008.00951
Shanmugam, R., & Tong, Y. L. (1993). The multivariate normal distribution. Springer Science & Business Media.
Sheikh, H. R., & Bovik, A. C. (2006). Image information and visual quality. IEEE Transactions on Image Processing, 15(2), 430–444.
Shen, Y., & Zhou, B. (2021). Closed-form factorization of latent semantics in GaNs. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1532–1540). https://doi.org/10.1109/CVPR46437.2021.00158, arXiv:2007.06600
Shen, Y., Yang, C., Tang, X., & Zhou, B. (2020). Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4), 2004–2018. https://doi.org/10.1109/TPAMI.2020.3034267. arXiv:2005.09635.
Snell, J., Ridgeway, K., Liao, R., Roads, B. D., Mozer, M. C., & Zemel, R. S. (2017). Learning to generate images with perceptual similarity metrics. In 2017 IEEE international conference on image processing (ICIP) (pp. 4277–4281).
Tan, D. S., Soeseno, J. H., & Hua, K. L. (2021). Controllable and identity-aware facial attribute transformation. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2021.3071172
Upchurch, P., Gardner, J., Pleiss, G., Pless, R., Snavely, N., Bala, K., & Weinberger, K. (2017). Deep feature interpolation for image content changes. In: Proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017 (pp. 6090–6099) https://doi.org/10.1109/CVPR.2017.645, arXiv:1611.05507
Viazovetskyi, Y., Ivashkin, V., & Kashin, E. (2020). StyleGAN2 distillation for feed-forward image manipulation. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (Vol. 12367 LNCS, pp. 170–186). Springer https://doi.org/10.1007/978-3-030-58542-6_11, arXiv:2003.03581
Wang, Y., Gonzalez-Garcia, A., van De Weijer, J., & Herranz, L. (2019). SDIT: Scalable and diverse cross-domain image translation. In: MM 2019 - Proceedings of the 27th ACM international conference on multimedia (pp. 1267–1276). https://doi.org/10.1145/3343031.3351004, arXiv:1908.06881
Wang, Z., & Bovik, A. C. (2002). A universal image quality index. IEEE Signal Processing Letters, 9(3), 81–84. https://doi.org/10.1109/97.995823
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://doi.org/10.1109/TIP.2003.819861
Wei, X., Shen, H., Li, Y., Tang, X., Wang, F., Kleinsteuber, M., & Murphey, Y. L. (2018). Reconstructible nonlinear dimensionality reduction via joint dictionary learning. IEEE Transactions on Neural Networks and Learning Systems, 30(1), 175–189.
Xiao, T., Hong, J., & Ma, J. (2018). ELEGANT: Exchanging latent encodings with GAN for transferring multiple face attributes. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (pp. 172–187). https://doi.org/10.1007/978-3-030-01249-6_11, arXiv:1803.10562
Xue, W., Zhang, L., Mou, X., & Bovik, A. C. (2013). Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. IEEE Transactions on Image Processing, 23(2), 684–695.
Yang, G., Fei, N., Ding, M., Liu, G., Lu, Z., & Xiang, T. (2021). L2m-gan: Learning to manipulate latent space semantics for facial attribute editing. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2950–2959). https://doi.org/10.1109/CVPR46437.2021.00297
Yang, N., Zheng, Z., Zhou, M., Guo, X., Qi, L., & Wang, T. (2021). A domain-guided noise-optimization-based inversion method for facial image manipulation. IEEE Transactions on Image Processing, 30, 6198–6211. https://doi.org/10.1109/TIP.2021.3089905
Yang, N., Zhou, M., Xia, B., Guo, X., & Qi, L. (2021). Inversion based on a detached dual-channel domain method for StyleGAN2 embedding. IEEE Signal Processing Letters, 28, 553–557. https://doi.org/10.1109/LSP.2021.3059371
Zhang, J., Huang, Y., Li, Y., Zhao, W., & Zhang, L. (2019). Multi-attribute transfer via disentangled representation. In: 33rd AAAI conference on artificial intelligence, AAAI 2019, 31st innovative applications of artificial intelligence conference, IAAI 2019 and the 9th AAAI symposium on educational advances in artificial intelligence, EAAI 2019 (pp. 9195–9202). https://doi.org/10.1609/aaai.v33i01.33019195
Zhang, K., Su, Y., Guo, X., Qi, L., & Zhao, Z. (2020). MU-GAN: Facial attribute editing based on multi-attention mechanism. IEEE/CAA Journal of Automatica Sinica, 8(9), 1614–1626.
Zhang, L., Zhang, L., Mou, X., & Zhang, D. (2011). FSIM: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing, 20(8), 2378–2386.
Zhang, L., Shen, Y., & Li, H. (2014). VSI: A visual saliency-induced index for perceptual image quality assessment. IEEE Transactions on Image Processing, 23(10), 4270–4281. https://doi.org/10.1109/TIP.2014.2346028
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–595).
Zhao, B., Chang, B., Jie, Z., & Sigal, L. (2018). Modular generative adversarial networks. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics) (pp. 157–173). https://doi.org/10.1007/978-3-030-01264-9_10
Zheng, X., Guo, Y., Huang, H., Li, Y., & He, R. (2020). A survey of deep facial attribute analysis. International Journal of Computer Vision, 128(8), 2002–2034.
Zhu, F., Cao, H., Feng, Z., Zhang, Y., Luo, W., Zhou, H., Song, M., & Ma, K. -K. (2019). Semi-supervised eye makeup transfer by swapping learned representation. In Proceedings - 2019 international conference on computer vision workshop, ICCVW 2019 (pp. 3858–3867) https://doi.org/10.1109/ICCVW.2019.00479
Zhu, J., Zhao, D., Zhang, B., & Zhou, B. (2022). Disentangled inference for gans with latently invertible autoencoder. International Journal of Computer Vision, 130(5), 1259–1276.
Zhu, J. Y., Li, H., Shechtman, E., Liu, M. Y., Kautz, J., & Torralba, A. (2020). Guest editorial: Generative adversarial networks for computer vision. International Journal of Computer Vision, 128, 2363–2365.
Acknowledgements
This work This work was supported by the National Key Research and Development Program of China under Grant 2020YFB1313400, Youth Innovation Promotion Association of the Chinese Academy of Sciences under Grant Y202051 CAS Project for Young Scientists in Basic Research, Grant No. YSBR-041, the National Natural Science Foundation of China under Grant 42306214, 61821005 and 61991413, Shandong Province Postdoctoral Innovative Talents Support Program (SDBX2022026), China Postdoctoral Science Foundation (2023M733533), and Special Research Assistant Project of the Chinese Academy of Sciences in 2022.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by Arun Mallya.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, N., Luan, X., Jia, H. et al. CCR: Facial Image Editing with Continuity, Consistency and Reversibility. Int J Comput Vis 132, 1336–1349 (2024). https://doi.org/10.1007/s11263-023-01938-z
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-023-01938-z