这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Single-View View Synthesis with Self-rectified Pseudo-Stereo

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Synthesizing novel views from a single view image is a highly ill-posed problem. We discover an effective solution to reduce the learning ambiguity by expanding the single-view view synthesis problem to a multi-view setting. Specifically, we leverage the reliable and explicit stereo prior to generate a pseudo-stereo viewpoint, which serves as an auxiliary input to construct the 3D space. In this way, the challenging novel view synthesis process is decoupled into two simpler problems of stereo synthesis and 3D reconstruction. In order to synthesize a structurally correct and detail-preserved stereo image, we propose a self-rectified stereo synthesis to amend erroneous regions in an identify-rectify manner. Hard-to-train and incorrect warping samples are first discovered by two strategies, (1) pruning the network to reveal low-confident predictions; and (2) bidirectionally matching between stereo images to allow the discovery of improper mapping. These regions are then inpainted to form the final pseudo-stereo. With the aid of this extra input, a preferable 3D reconstruction can be easily obtained, and our method can work with arbitrary 3D representations. Extensive experiments show that our method outperforms state-of-the-art single-view view synthesis methods and stereo synthesis methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Aliev, K. A., Ulyanov, D., & Lempitsky, V. S. (2020). Neural point-based graphics. In ECCV

  • Chaurasia, G., Duchêne, S., Sorkine-Hornung, O., & Drettakis, G. (2013). Depth synthesis and local warps for plausible image-based navigation. ACM TOG, 32, 30:1-30:12.

    Article  Google Scholar 

  • Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NeurIPS

  • Choi, I., Gallo, O., Troccoli, A. J., Kim, M. H., & Kautz, J. (2019). Extreme view synthesis. In ICCV (pp. 7780–7789).

  • Cun, X., Xu, F., Pun, C. M., & Gao, H. (2019). Depth-assisted full resolution network for single image-based view synthesis. IEEE Computer Graphics and Applications, 39, 52–64.

    Article  Google Scholar 

  • Debevec, P. E., Taylor, C. J., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques (pp. 11–20).

  • Debevec, P. E., Yu, Y., & Borshukov, G. (1998). Efficient view-dependent image-based rendering with projective texture-mapping. In Rendering Techniques.

  • Fitzgibbon, A. W., Wexler, Y., & Zisserman, A. (2005). Image-based rendering using image-based priors. International Journal of Computer Vision, 63, 141–151.

    Article  Google Scholar 

  • Flynn, J., Broxton, M., Debevec, P. E., DuVall, M., Fyffe, G., Overbeck, R. S., Snavely, N., & Tucker, R. (2019) Deepview: View synthesis with learned gradient descent. In CVPR (pp. 2362–2371).

  • Frankle, J., Dziugaite, G. K., Roy, D. M., & Carbin, M. (2019). The lottery ticket hypothesis at scale. arXiv:1903.01611

  • Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32, 1231–1237.

    Article  Google Scholar 

  • Godard, C., Aodha, O. M., & Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR (pp. 6602–6611).

  • GonzalezBello, J. L., & Kim, M. (2020). Forget about the lidar: Self-supervised depth estimators with med probability volumes. In NeurIPS

  • Gonzalez, J. L., & Kim, M. (2021). Plade-net: Towards pixel-level accuracy for self-supervised single-view depth estimation with neural positional encoding and distilled matting loss. In CVPR.

  • Hedman, P., Philip, J., Price, T., Frahm, J. M., Drettakis, G., & Brostow, G. J. (2018). Deep blending for free-viewpoint image-based rendering. ACM TOG, 37, 1–15.

    Google Scholar 

  • Hooker, S., Courville, A., Clark, G., Dauphin, Y., & Frome, A. (2019). What do compressed deep neural networks forget? arXiv

  • Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In CVPR (pp. 2462–2470).

  • Jampani, V., Chang, H., Sargent, K., Kar, A., Tucker, R., Krainin, M., Kaeser, D., Freeman, W. T., Salesin, D., Curless, B., et al. (2021). Slide: Single image 3d photography with soft layering and depth-aware inpainting. In ICCV (pp. 12518–12527).

  • Jantet, V., Morin, L., & Guillemot, C. (2009). Incremental-ldi for multi-view coding. In 3DTV (pp. 1–4).

  • Jiang, Z., Chen, T., Mortazavi, B. J., & Wang, Z. (2021). Self-damaging contrastive learning. arXiv:2106.02990

  • Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In CVPR (pp. 4396–4405).

  • Kingma, D. P., Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR

  • Kopf, J., Langguth, F., Scharstein, D., Szeliski, R., & Goesele, M. (2013). Image-based rendering in the gradient domain. ACM TOG, 32, 1–9.

    Google Scholar 

  • Kopf, J., Matzen, K., Alsisan, S., Quigley, O., Ge, F., Chong, Y., Patterson, J., Frahm, J. M., Wu, S., Yu, M., et al. (2020). One shot 3d photography. ACM TOG, 39(4), 76:1-76:13.

    Article  Google Scholar 

  • Kulkarni, T. D., Whitney, W. F., Kohli, P., & Tenenbaum, J. B. (2015). Deep convolutional inverse graphics network. In NeurIPS.

  • Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2017). Pruning filters for efficient convnets. arXiv:1608.08710

  • Li, J., Feng, Z., She, Q., Ding, H., Wang, C., & Lee, G. H. (2021). Mine: Towards continuous depth mpi with nerf for novel view synthesis. In ICCV (pp. 12578–12588).

  • Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In ICCV (pp. 2755–2763).

  • Luo, Y., Ren, J., Lin, M., Pang, J., Sun, W., Li, H., & Lin, L. (2018). Single view stereo matching. In CVPR (pp. 155–163).

  • Martin-Brualla, R., Pandey, R., Yang, S., Pidlypenskyi, P., Taylor, J., Valentin, J. P. C., Khamis, S., Davidson, P. L., Tkach, A., Lincoln, P., Kowdle, A., Rhemann, C., Goldman, D. B., Keskin, C., Seitz, S. M., Izadi, S., & Fanello, S. (2018). Lookingood: Enhancing performance capture with real-time neural re-rendering. ACM TOG 37, 255:1–255:14.

  • Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR (pp. 4040–4048).

  • Meshry, M., Goldman, D. B., Khamis, S., Hoppe, H., Pandey, R., Snavely, N., & Martin-Brualla, R. (2019). Neural rerendering in the wild. In CVPR (pp. 6871–6880).

  • Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV.

  • Novotný, D., Graham, B., & Reizenstein, J. (2019). Perspectivenet: A scene-consistent image generator for new view synthesis in real indoor environments. In NeurIPS.

  • Park, E., Yang, J., Yumer, E., Ceylan, D., & Berg, A. C. (2017). Transformation-grounded image generation network for novel 3d view synthesis. In CVPR (pp. 702–711).

  • Park, K., Sinha, U., Barron, J. T., Bouaziz, S., Goldman, D. B., Seitz, S. M., & Brualla, R. M. (2020). Deformable neural radiance fields. arXiv:2011.12948

  • Penner, E., & Zhang, L. (2017). Soft 3d reconstruction for view synthesis. ACM TOG, 36, 1–11.

    Article  Google Scholar 

  • Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR (pp. 519–528).

  • Shih, M. L., Su, S. Y., Kopf, J., & Huang, J. B. (2020). 3d photography using context-aware layered depth inpainting. In CVPR (pp. 8025–8035).

  • Sinha, S. N., Kopf, J., Goesele, M., Scharstein, D., & Szeliski, R. (2012). Image-based rendering for scenes with reflections. ACM TOG, 31, 1–10.

    Article  Google Scholar 

  • Srinivasan, P. P., Tucker, R., Barron, J. T., Ramamoorthi, R., Ng, R., & Snavely, N. (2019). Pushing the boundaries of view extrapolation with multiplane images. In CVPR (pp. 175–184).

  • Srinivasan, P. P., Wang, T., Sreelal, A., Ramamoorthi, R., & Ng, R. (2017). Learning to synthesize a 4d rgbd light field from a single image. In ICCV.

  • Sun, S. H., Huh, M., Liao, Y. H., Zhang, N., Lim, J. J. (2018). Multi-view to novel view: Synthesizing novel views with self-learned confidence. In ECCV.

  • Szeliski, R., & Golland, P. (2004). Stereo matching with transparency and matting. International Journal of Computer Vision, 32, 45–61.

    Article  Google Scholar 

  • Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2016). Multi-view 3d models from single images with a convolutional network. In ECCV.

  • Tucker, R., & Snavely, N. (2020). Single-view view synthesis with multiplane images. In CVPR (pp. 548–557).

  • Tulsiani, S., Tucker, R., & Snavely, N. (2018). Layer-structured 3d scene inference via view synthesis. In ECCV.

  • Wang, Z., Wang, H., Chen, T., Wang, Z., & Ma, K. (2021). Troubleshooting blind image quality models in the wild. In CVPR (pp. 16251–16260).

  • Watson, J., Mac A. O., Turmukhambetov, D., Brostow, G. J., & Firman, M. (2020). Learning stereo from single images. In ECCV (pp. 722–740).

  • Xie, J., Girshick, R. B., Farhadi, A. (2016). Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In ECCV.

  • Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. CoRR arXiv:1511.07122

  • Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T. S. (2019). Free-form image inpainting with gated convolution. In ICCV (pp. 4471–4480).

  • Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In CVPR (pp. 586–595).

  • Zhou, T., Tucker, R., Flynn, J., Fyffe, G., & Snavely, N. (2018). Stereo magnification: Learning view synthesis using multiplane images. ACM TOG, 37(4), 1–12.

    Article  Google Scholar 

  • Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A. A. (2016). View synthesis by appearance flow. arXiv:1605.03557

Download references

Acknowledgements

This project is supported by the National Natural Science Foundation of China (No. 61972162); Project of Strategic Importance in The Hong Kong Polytechnic University (project no. 1-ZE2Q); Guangdong Natural Science Foundation (No. 2021A1515012625); Guangdong Natural Science Funds for Distinguished Young Scholar (No. 2023B1515020097); Singapore Ministry of Education Academic Research Fund Tier 1 (MSS23C002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shengfeng He.

Additional information

Communicated by Boxin Shi, Ph.D.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Wu, H., Liu, W. et al. Single-View View Synthesis with Self-rectified Pseudo-Stereo. Int J Comput Vis 131, 2032–2043 (2023). https://doi.org/10.1007/s11263-023-01803-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-023-01803-z

Keywords