这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

CCR: Facial Image Editing with Continuity, Consistency and Reversibility

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Three problems exist in sequential facial image editing: discontinuous editing, inconsistent editing, and irreversible editing. Discontinuous editing is that the current editing can not retain the previously edited attributes. Inconsistent editing is that swapping the attribute editing orders can not yield the same results. Irreversible editing means that operating on a facial image is irreversible, especially in sequential facial image editing. In this work, we put forward three concepts and their corresponding definitions: editing continuity, consistency, and reversibility. Note that continuity refers to the continuity of attributes, that is, attributes can be continuously edited on any face. Consistency is that not only attributes meet continuity, but also facial identity needs to be consistent. To do so, we propose a novel model to achieve the goal of editing continuity, consistency, and reversibility. Furthermore, a sufficient criterion is defined to determine whether a model is continuous, consistent, and reversible. Extensive qualitative and quantitative experimental results validate our proposed model, and show that a continuous, consistent and reversible editing model has a more flexible editing function while preserving facial identity. We believe that our proposed definitions and model will have wide and promising applications in multimedia processing. Code and data are available at https://github.com/mickoluan/CCR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Abdal, R., Qin, Y., & Wonka, P. (2020). Image2StyleGAN++: How to edit the embedded images? In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 8293–8302). https://doi.org/10.1109/CVPR42600.2020.00832

  • Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)?-Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014

    Article  Google Scholar 

  • Chen, X., Ni, B., Liu, N., Liu, Z., Jiang, Y., Truong, L., & Tian, Q. (2020). Coogan: A memory-efficient framework for high-resolution facial attribute editing. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16 (pp. 670–686). Springer International Publishing

  • Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, (pp. 8789–8797) https://doi.org/10.1109/CVPR.2018.00916, 1711.09020

  • De Bot, K., Gommans, P., & Rossing, C. (1991). L1 loss in an L2 environment: Dutch immigrants in France. First Lang attrition (pp. 87–98).

  • Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). ArcFace: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 4685–4694). https://doi.org/10.1109/CVPR.2019.00482, arXiv:1801.07698

  • Ding, K., Ma, K., Wang, S., & Simoncelli, E. P. (2020). Image quality assessment: Unifying structure and texture similarity. arXiv preprint arXiv:2004.07728

  • Geng, Z., Cao, C., & Tulyakov, S. (2020). Towards photo-realistic facial expression manipulation. International Journal of Computer Vision, 128(10), 2744–2761.

    Article  Google Scholar 

  • Georgopoulos, M., Oldfield, J., Nicolaou, M. A., Panagakis, Y., & Pantic, M. (2021). Mitigating demographic bias in facial datasets with style-based multi-attribute transfer. International Journal of Computer Vision, 129(7), 2288–2307.

    Article  Google Scholar 

  • Guo, D., Shamai, S., & Verdú, S. (2004). Mutual information and MMSE in Gaussian channels. IEEE International Symposium on Information Theory - Proceedings, 51(4), 347. https://doi.org/10.1109/isit.2004.1365386

    Article  Google Scholar 

  • He, Z., Zuo, W., Kan, M., Shan, S., & Chen, X. (2019). AttGAN: Facial attribute editing by only changing what you want. IEEE Transactions on Image Processing, 28(11), 5464–5478. https://doi.org/10.1109/TIP.2019.2916751. arXiv:1711.10678.

    Article  MathSciNet  Google Scholar 

  • Hore, A., & Ziou, D. (2010). Image quality metrics: PSNR versus SSIM. In 2010 20th international conference on pattern recognition (pp. 2366–2369) (2010).

  • Huang, X., Liu, M. Y., Belongie, S., & Kautz, J. (2018). Multimodal unsupervised image-to-image translation. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (pp. 179–196). https://doi.org/10.1007/978-3-030-01219-9_11, arXiv:1804.04732

  • Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv Prepr arXiv:1710.10196

  • Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition 2019 (pp. 4396–4405). https://doi.org/10.1109/CVPR.2019.00453, arXiv:1812.04948

  • Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 8107–8116). https://doi.org/10.1109/CVPR42600.2020.00813, arXiv:1912.04958

  • Li, X., Zhang, S., Hu, J., Cao, L., Hong, X., Mao, X., Huang, F., Wu, Y., Ji, R. (2021). Image-to-image translation via hierarchical style disentanglement. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 8635–8644). https://doi.org/10.1109/CVPR46437.2021.00853, arXiv:2103.01456

  • Li, Z., Zhang, Z., Qin, J., Zhang, Z., & Shao, L. (2019). Discriminative fisher embedding dictionary learning algorithm for object recognition. IEEE Transactions on Neural Networks and Learning Systems, 31(3), 786–800.

    Article  MathSciNet  Google Scholar 

  • Liu, M., Ding, Y., Xia, M., Liu, X., Ding, E., Zuo, W., & Wen, S. (2019). STGAN: A unified selective transfer network for arbitrary image attribute editing. Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3668–3677). https://doi.org/10.1109/CVPR.2019.00379, arXiv:1904.09709

  • Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision (pp. 3730–3738) https://doi.org/10.1109/ICCV.2015.425, arXiv:1411.7766

  • Mescheder, L., Geiger, A., & Nowozin, S. (2018). Which training methods for GANs do actually converge? In 35th international conference on machine learning, ICML 2018, vol 8. PMLR (pp. 5589–5626). arXiv:1801.04406

  • Miyato, T., Kataoka, T., Koyama, M., & Yoshida, Y. (2018). Spectral normalization for generative adversarial networks. In 6th international conference on learning representations, ICLR 2018 - conference track proceedingsarXiv:1802.05957

  • Moore, T., & Fallah, M. (2001). Control of eye movements and spatial attention. Proceedings of the National Academy of Sciences of the United States of America, 98(3), 1273–1276. https://doi.org/10.1073/pnas.98.3.1273

    Article  Google Scholar 

  • Norouzi, M., Fleet, D. J., & Salakhutdinov, R. (2012). Hamming distance metric learning. Advances in Neural Information Processing Systems 1061–1069

  • Pumarola, A., Agudo, A., Martinez, A. M., Sanfeliu, A., & Moreno-Noguer, F. (2018). GANimation: Anatomically-aware facial animation from a single image. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (pp. 835–851). https://doi.org/10.1007/978-3-030-01249-6_50, arXiv:1807.09251

  • Pumarola, A., Agudo, A., Martinez, A. M., Sanfeliu, A., & Moreno-Noguer, F. (2020). Ganimation: One-shot anatomically consistent facial animation. International Journal of Computer Vision, 128(3), 698–713.

    Article  Google Scholar 

  • Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv Prepr arXiv:1511.06434

  • Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., & Cohen-Or, D. (2021). Encoding in Style: A StyleGAN encoder for image-to-image translation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2287–2296). https://doi.org/10.1109/CVPR46437.2021.00232, arXiv:2008.00951

  • Shanmugam, R., & Tong, Y. L. (1993). The multivariate normal distribution. Springer Science & Business Media.

    Google Scholar 

  • Sheikh, H. R., & Bovik, A. C. (2006). Image information and visual quality. IEEE Transactions on Image Processing, 15(2), 430–444.

    Article  Google Scholar 

  • Shen, Y., & Zhou, B. (2021). Closed-form factorization of latent semantics in GaNs. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1532–1540). https://doi.org/10.1109/CVPR46437.2021.00158, arXiv:2007.06600

  • Shen, Y., Yang, C., Tang, X., & Zhou, B. (2020). Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4), 2004–2018. https://doi.org/10.1109/TPAMI.2020.3034267. arXiv:2005.09635.

    Article  Google Scholar 

  • Snell, J., Ridgeway, K., Liao, R., Roads, B. D., Mozer, M. C., & Zemel, R. S. (2017). Learning to generate images with perceptual similarity metrics. In 2017 IEEE international conference on image processing (ICIP) (pp. 4277–4281).

  • Tan, D. S., Soeseno, J. H., & Hua, K. L. (2021). Controllable and identity-aware facial attribute transformation. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2021.3071172

    Article  Google Scholar 

  • Upchurch, P., Gardner, J., Pleiss, G., Pless, R., Snavely, N., Bala, K., & Weinberger, K. (2017). Deep feature interpolation for image content changes. In: Proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017 (pp. 6090–6099) https://doi.org/10.1109/CVPR.2017.645, arXiv:1611.05507

  • Viazovetskyi, Y., Ivashkin, V., & Kashin, E. (2020). StyleGAN2 distillation for feed-forward image manipulation. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (Vol. 12367 LNCS, pp. 170–186). Springer https://doi.org/10.1007/978-3-030-58542-6_11, arXiv:2003.03581

  • Wang, Y., Gonzalez-Garcia, A., van De Weijer, J., & Herranz, L. (2019). SDIT: Scalable and diverse cross-domain image translation. In: MM 2019 - Proceedings of the 27th ACM international conference on multimedia (pp. 1267–1276). https://doi.org/10.1145/3343031.3351004, arXiv:1908.06881

  • Wang, Z., & Bovik, A. C. (2002). A universal image quality index. IEEE Signal Processing Letters, 9(3), 81–84. https://doi.org/10.1109/97.995823

    Article  Google Scholar 

  • Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://doi.org/10.1109/TIP.2003.819861

    Article  Google Scholar 

  • Wei, X., Shen, H., Li, Y., Tang, X., Wang, F., Kleinsteuber, M., & Murphey, Y. L. (2018). Reconstructible nonlinear dimensionality reduction via joint dictionary learning. IEEE Transactions on Neural Networks and Learning Systems, 30(1), 175–189.

    Article  MathSciNet  Google Scholar 

  • Xiao, T., Hong, J., & Ma, J. (2018). ELEGANT: Exchanging latent encodings with GAN for transferring multiple face attributes. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (pp. 172–187). https://doi.org/10.1007/978-3-030-01249-6_11, arXiv:1803.10562

  • Xue, W., Zhang, L., Mou, X., & Bovik, A. C. (2013). Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. IEEE Transactions on Image Processing, 23(2), 684–695.

    Article  MathSciNet  Google Scholar 

  • Yang, G., Fei, N., Ding, M., Liu, G., Lu, Z., & Xiang, T. (2021). L2m-gan: Learning to manipulate latent space semantics for facial attribute editing. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2950–2959). https://doi.org/10.1109/CVPR46437.2021.00297

  • Yang, N., Zheng, Z., Zhou, M., Guo, X., Qi, L., & Wang, T. (2021). A domain-guided noise-optimization-based inversion method for facial image manipulation. IEEE Transactions on Image Processing, 30, 6198–6211. https://doi.org/10.1109/TIP.2021.3089905

  • Yang, N., Zhou, M., Xia, B., Guo, X., & Qi, L. (2021). Inversion based on a detached dual-channel domain method for StyleGAN2 embedding. IEEE Signal Processing Letters, 28, 553–557. https://doi.org/10.1109/LSP.2021.3059371

    Article  Google Scholar 

  • Zhang, J., Huang, Y., Li, Y., Zhao, W., & Zhang, L. (2019). Multi-attribute transfer via disentangled representation. In: 33rd AAAI conference on artificial intelligence, AAAI 2019, 31st innovative applications of artificial intelligence conference, IAAI 2019 and the 9th AAAI symposium on educational advances in artificial intelligence, EAAI 2019 (pp. 9195–9202). https://doi.org/10.1609/aaai.v33i01.33019195

  • Zhang, K., Su, Y., Guo, X., Qi, L., & Zhao, Z. (2020). MU-GAN: Facial attribute editing based on multi-attention mechanism. IEEE/CAA Journal of Automatica Sinica, 8(9), 1614–1626.

    Article  Google Scholar 

  • Zhang, L., Zhang, L., Mou, X., & Zhang, D. (2011). FSIM: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing, 20(8), 2378–2386.

    Article  MathSciNet  Google Scholar 

  • Zhang, L., Shen, Y., & Li, H. (2014). VSI: A visual saliency-induced index for perceptual image quality assessment. IEEE Transactions on Image Processing, 23(10), 4270–4281. https://doi.org/10.1109/TIP.2014.2346028

    Article  MathSciNet  Google Scholar 

  • Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–595).

  • Zhao, B., Chang, B., Jie, Z., & Sigal, L. (2018). Modular generative adversarial networks. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics) (pp. 157–173). https://doi.org/10.1007/978-3-030-01264-9_10

  • Zheng, X., Guo, Y., Huang, H., Li, Y., & He, R. (2020). A survey of deep facial attribute analysis. International Journal of Computer Vision, 128(8), 2002–2034.

    Article  Google Scholar 

  • Zhu, F., Cao, H., Feng, Z., Zhang, Y., Luo, W., Zhou, H., Song, M., & Ma, K. -K. (2019). Semi-supervised eye makeup transfer by swapping learned representation. In Proceedings - 2019 international conference on computer vision workshop, ICCVW 2019 (pp. 3858–3867) https://doi.org/10.1109/ICCVW.2019.00479

  • Zhu, J., Zhao, D., Zhang, B., & Zhou, B. (2022). Disentangled inference for gans with latently invertible autoencoder. International Journal of Computer Vision, 130(5), 1259–1276.

    Article  Google Scholar 

  • Zhu, J. Y., Li, H., Shechtman, E., Liu, M. Y., Kautz, J., & Torralba, A. (2020). Guest editorial: Generative adversarial networks for computer vision. International Journal of Computer Vision, 128, 2363–2365.

    Article  Google Scholar 

Download references

Acknowledgements

This work This work was supported by the National Key Research and Development Program of China under Grant 2020YFB1313400, Youth Innovation Promotion Association of the Chinese Academy of Sciences under Grant Y202051 CAS Project for Young Scientists in Basic Research, Grant No. YSBR-041, the National Natural Science Foundation of China under Grant 42306214, 61821005 and 61991413, Shandong Province Postdoctoral Innovative Talents Support Program (SDBX2022026), China Postdoctoral Science Foundation  (2023M733533), and Special Research Assistant Project of the Chinese Academy of Sciences in 2022.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiaofeng Li or Yandong Tang.

Additional information

Communicated by Arun Mallya.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, N., Luan, X., Jia, H. et al. CCR: Facial Image Editing with Continuity, Consistency and Reversibility. Int J Comput Vis 132, 1336–1349 (2024). https://doi.org/10.1007/s11263-023-01938-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-023-01938-z

Keywords