Single-View View Synthesis with Self-rectified Pseudo-Stereo

Zhou, Yang; Wu, Hanjie; Liu, Wenxi; Xiong, Zheng; Qin, Jing; He, Shengfeng

doi:10.1007/s11263-023-01803-z

Single-View View Synthesis with Self-rectified Pseudo-Stereo

Published: 10 May 2023

Volume 131, pages 2032–2043, (2023)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yang Zhou¹,
Hanjie Wu¹,
Wenxi Liu²,
Zheng Xiong¹,
Jing Qin³ &
…
Shengfeng He ORCID: orcid.org/0000-0002-3802-4644⁴

621 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Synthesizing novel views from a single view image is a highly ill-posed problem. We discover an effective solution to reduce the learning ambiguity by expanding the single-view view synthesis problem to a multi-view setting. Specifically, we leverage the reliable and explicit stereo prior to generate a pseudo-stereo viewpoint, which serves as an auxiliary input to construct the 3D space. In this way, the challenging novel view synthesis process is decoupled into two simpler problems of stereo synthesis and 3D reconstruction. In order to synthesize a structurally correct and detail-preserved stereo image, we propose a self-rectified stereo synthesis to amend erroneous regions in an identify-rectify manner. Hard-to-train and incorrect warping samples are first discovered by two strategies, (1) pruning the network to reveal low-confident predictions; and (2) bidirectionally matching between stereo images to allow the discovery of improper mapping. These regions are then inpainted to form the final pseudo-stereo. With the aid of this extra input, a preferable 3D reconstruction can be easily obtained, and our method can work with arbitrary 3D representations. Extensive experiments show that our method outperforms state-of-the-art single-view view synthesis methods and stereo synthesis methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single View Depth Estimation via Dense Convolution Network with Self-supervision

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

Multi-view to Novel View: Synthesizing Novel Views With Self-learned Confidence

References

Aliev, K. A., Ulyanov, D., & Lempitsky, V. S. (2020). Neural point-based graphics. In ECCV
Chaurasia, G., Duchêne, S., Sorkine-Hornung, O., & Drettakis, G. (2013). Depth synthesis and local warps for plausible image-based navigation. ACM TOG, 32, 30:1-30:12.
Article Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NeurIPS
Choi, I., Gallo, O., Troccoli, A. J., Kim, M. H., & Kautz, J. (2019). Extreme view synthesis. In ICCV (pp. 7780–7789).
Cun, X., Xu, F., Pun, C. M., & Gao, H. (2019). Depth-assisted full resolution network for single image-based view synthesis. IEEE Computer Graphics and Applications, 39, 52–64.
Article Google Scholar
Debevec, P. E., Taylor, C. J., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques (pp. 11–20).
Debevec, P. E., Yu, Y., & Borshukov, G. (1998). Efficient view-dependent image-based rendering with projective texture-mapping. In Rendering Techniques.
Fitzgibbon, A. W., Wexler, Y., & Zisserman, A. (2005). Image-based rendering using image-based priors. International Journal of Computer Vision, 63, 141–151.
Article Google Scholar
Flynn, J., Broxton, M., Debevec, P. E., DuVall, M., Fyffe, G., Overbeck, R. S., Snavely, N., & Tucker, R. (2019) Deepview: View synthesis with learned gradient descent. In CVPR (pp. 2362–2371).
Frankle, J., Dziugaite, G. K., Roy, D. M., & Carbin, M. (2019). The lottery ticket hypothesis at scale. arXiv:1903.01611
Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32, 1231–1237.
Article Google Scholar
Godard, C., Aodha, O. M., & Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR (pp. 6602–6611).
GonzalezBello, J. L., & Kim, M. (2020). Forget about the lidar: Self-supervised depth estimators with med probability volumes. In NeurIPS
Gonzalez, J. L., & Kim, M. (2021). Plade-net: Towards pixel-level accuracy for self-supervised single-view depth estimation with neural positional encoding and distilled matting loss. In CVPR.
Hedman, P., Philip, J., Price, T., Frahm, J. M., Drettakis, G., & Brostow, G. J. (2018). Deep blending for free-viewpoint image-based rendering. ACM TOG, 37, 1–15.
Google Scholar
Hooker, S., Courville, A., Clark, G., Dauphin, Y., & Frome, A. (2019). What do compressed deep neural networks forget? arXiv
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In CVPR (pp. 2462–2470).
Jampani, V., Chang, H., Sargent, K., Kar, A., Tucker, R., Krainin, M., Kaeser, D., Freeman, W. T., Salesin, D., Curless, B., et al. (2021). Slide: Single image 3d photography with soft layering and depth-aware inpainting. In ICCV (pp. 12518–12527).
Jantet, V., Morin, L., & Guillemot, C. (2009). Incremental-ldi for multi-view coding. In 3DTV (pp. 1–4).
Jiang, Z., Chen, T., Mortazavi, B. J., & Wang, Z. (2021). Self-damaging contrastive learning. arXiv:2106.02990
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In CVPR (pp. 4396–4405).
Kingma, D. P., Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR
Kopf, J., Langguth, F., Scharstein, D., Szeliski, R., & Goesele, M. (2013). Image-based rendering in the gradient domain. ACM TOG, 32, 1–9.
Google Scholar
Kopf, J., Matzen, K., Alsisan, S., Quigley, O., Ge, F., Chong, Y., Patterson, J., Frahm, J. M., Wu, S., Yu, M., et al. (2020). One shot 3d photography. ACM TOG, 39(4), 76:1-76:13.
Article Google Scholar
Kulkarni, T. D., Whitney, W. F., Kohli, P., & Tenenbaum, J. B. (2015). Deep convolutional inverse graphics network. In NeurIPS.
Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2017). Pruning filters for efficient convnets. arXiv:1608.08710
Li, J., Feng, Z., She, Q., Ding, H., Wang, C., & Lee, G. H. (2021). Mine: Towards continuous depth mpi with nerf for novel view synthesis. In ICCV (pp. 12578–12588).
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In ICCV (pp. 2755–2763).
Luo, Y., Ren, J., Lin, M., Pang, J., Sun, W., Li, H., & Lin, L. (2018). Single view stereo matching. In CVPR (pp. 155–163).
Martin-Brualla, R., Pandey, R., Yang, S., Pidlypenskyi, P., Taylor, J., Valentin, J. P. C., Khamis, S., Davidson, P. L., Tkach, A., Lincoln, P., Kowdle, A., Rhemann, C., Goldman, D. B., Keskin, C., Seitz, S. M., Izadi, S., & Fanello, S. (2018). Lookingood: Enhancing performance capture with real-time neural re-rendering. ACM TOG 37, 255:1–255:14.
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR (pp. 4040–4048).
Meshry, M., Goldman, D. B., Khamis, S., Hoppe, H., Pandey, R., Snavely, N., & Martin-Brualla, R. (2019). Neural rerendering in the wild. In CVPR (pp. 6871–6880).
Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV.
Novotný, D., Graham, B., & Reizenstein, J. (2019). Perspectivenet: A scene-consistent image generator for new view synthesis in real indoor environments. In NeurIPS.
Park, E., Yang, J., Yumer, E., Ceylan, D., & Berg, A. C. (2017). Transformation-grounded image generation network for novel 3d view synthesis. In CVPR (pp. 702–711).
Park, K., Sinha, U., Barron, J. T., Bouaziz, S., Goldman, D. B., Seitz, S. M., & Brualla, R. M. (2020). Deformable neural radiance fields. arXiv:2011.12948
Penner, E., & Zhang, L. (2017). Soft 3d reconstruction for view synthesis. ACM TOG, 36, 1–11.
Article Google Scholar
Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR (pp. 519–528).
Shih, M. L., Su, S. Y., Kopf, J., & Huang, J. B. (2020). 3d photography using context-aware layered depth inpainting. In CVPR (pp. 8025–8035).
Sinha, S. N., Kopf, J., Goesele, M., Scharstein, D., & Szeliski, R. (2012). Image-based rendering for scenes with reflections. ACM TOG, 31, 1–10.
Article Google Scholar
Srinivasan, P. P., Tucker, R., Barron, J. T., Ramamoorthi, R., Ng, R., & Snavely, N. (2019). Pushing the boundaries of view extrapolation with multiplane images. In CVPR (pp. 175–184).
Srinivasan, P. P., Wang, T., Sreelal, A., Ramamoorthi, R., & Ng, R. (2017). Learning to synthesize a 4d rgbd light field from a single image. In ICCV.
Sun, S. H., Huh, M., Liao, Y. H., Zhang, N., Lim, J. J. (2018). Multi-view to novel view: Synthesizing novel views with self-learned confidence. In ECCV.
Szeliski, R., & Golland, P. (2004). Stereo matching with transparency and matting. International Journal of Computer Vision, 32, 45–61.
Article Google Scholar
Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2016). Multi-view 3d models from single images with a convolutional network. In ECCV.
Tucker, R., & Snavely, N. (2020). Single-view view synthesis with multiplane images. In CVPR (pp. 548–557).
Tulsiani, S., Tucker, R., & Snavely, N. (2018). Layer-structured 3d scene inference via view synthesis. In ECCV.
Wang, Z., Wang, H., Chen, T., Wang, Z., & Ma, K. (2021). Troubleshooting blind image quality models in the wild. In CVPR (pp. 16251–16260).
Watson, J., Mac A. O., Turmukhambetov, D., Brostow, G. J., & Firman, M. (2020). Learning stereo from single images. In ECCV (pp. 722–740).
Xie, J., Girshick, R. B., Farhadi, A. (2016). Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In ECCV.
Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. CoRR arXiv:1511.07122
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T. S. (2019). Free-form image inpainting with gated convolution. In ICCV (pp. 4471–4480).
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In CVPR (pp. 586–595).
Zhou, T., Tucker, R., Flynn, J., Fyffe, G., & Snavely, N. (2018). Stereo magnification: Learning view synthesis using multiplane images. ACM TOG, 37(4), 1–12.
Article Google Scholar
Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A. A. (2016). View synthesis by appearance flow. arXiv:1605.03557

Download references

Acknowledgements

This project is supported by the National Natural Science Foundation of China (No. 61972162); Project of Strategic Importance in The Hong Kong Polytechnic University (project no. 1-ZE2Q); Guangdong Natural Science Foundation (No. 2021A1515012625); Guangdong Natural Science Funds for Distinguished Young Scholar (No. 2023B1515020097); Singapore Ministry of Education Academic Research Fund Tier 1 (MSS23C002).

Author information

Authors and Affiliations

School of Computer Science and Engineering, South China University of Technology, Guangdong, China
Yang Zhou, Hanjie Wu & Zheng Xiong
College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
Wenxi Liu
Department of Nursing, Hong Kong Polytechnic University, Kowloon, Hong Kong
Jing Qin
School of Computing and Information Systems, Singapore Management University, Singapore, Singapore
Shengfeng He

Authors

Yang Zhou
View author publications
Search author on:PubMed Google Scholar
Hanjie Wu
View author publications
Search author on:PubMed Google Scholar
Wenxi Liu
View author publications
Search author on:PubMed Google Scholar
Zheng Xiong
View author publications
Search author on:PubMed Google Scholar
Jing Qin
View author publications
Search author on:PubMed Google Scholar
Shengfeng He
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Shengfeng He.

Additional information

Communicated by Boxin Shi, Ph.D.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, Y., Wu, H., Liu, W. et al. Single-View View Synthesis with Self-rectified Pseudo-Stereo. Int J Comput Vis 131, 2032–2043 (2023). https://doi.org/10.1007/s11263-023-01803-z

Download citation

Received: 21 September 2022
Accepted: 17 April 2023
Published: 10 May 2023
Version of record: 10 May 2023
Issue date: August 2023
DOI: https://doi.org/10.1007/s11263-023-01803-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single-View View Synthesis with Self-rectified Pseudo-Stereo

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single View Depth Estimation via Dense Convolution Network with Self-supervision

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

Multi-view to Novel View: Synthesizing Novel Views With Self-learned Confidence

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now