RAFT-MSF: Self-Supervised Monocular Scene Flow Using Recurrent Optimizer

Bayramli, Bayram; Hur, Junhwa; Lu, Hongtao

doi:10.1007/s11263-023-01828-4

RAFT-MSF: Self-Supervised Monocular Scene Flow Using Recurrent Optimizer

Manuscript
Published: 24 June 2023

Volume 131, pages 2757–2769, (2023)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

995 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

A popular approach to estimate scene flow is to utilize point cloud data from various Lidar scans. However, there is little attention to learning 3D motion from camera images. Learning scene flow from a monocular camera remains a challenging task due to its ill-posedness as well as lack of annotated data. Self-supervised methods demonstrate learning scene flow estimation from unlabeled data, yet their accuracy lags behind (semi-)supervised methods. In this paper, we introduce a self-supervised monocular scene flow method that substantially improves the accuracy over the previous approaches. Based on RAFT, a state-of-the-art optical flow model, we design a new decoder to iteratively update 3D motion fields and disparity maps simultaneously. Furthermore, we propose an enhanced upsampling layer and a disparity initialization technique, which overall further improves accuracy up to 7.2%. Our method achieves state-of-the-art accuracy among all self-supervised monocular scene flow methods, improving accuracy by 34.2%. Our fine-tuned model outperforms the best previous semi-supervised method with 228 times faster runtime. Code will be publicly available to ensure reproducibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic and Optical Flow Guided Self-supervised Monocular Depth and Ego-Motion Estimation

Enhancing Semantic-Guided Self-supervised Monocular Depth Estimation by Exploring Task-Related Representations

An Optical Flow-Based Solution to the Problem of Range Identification in Perspective Vision Systems

Article 19 July 2016

Data Availability

The datasets generated during and/or analyzed during the current study are available in the The KITTI Vision Benchmark Suite, https://www.cvlibs.net/datasets/kitti/

References

Basha, T., Moses, Y., & Kiryati, N. (2010). Multi-view scene flow estimation: A view centered variational approach. In CVPR.
Brickwedde, F., Abraham, S., & Mester, R. (2019). Mono-SF: Multi-view geometry meets single-view depth for monocular scene flow estimation of dynamic traffic scenes. In ICCV, pp. 2780–2790.
Chen, Y., Schmid, C., & Sminchisescu, C. (2019). Self-supervised learning with geometric constraints in monocular video: Connecting flow, depth, and camera. In ICCV, pp. 7062–7071.
Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In NIPS.
Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The KITTI dataset. IJRR, 32, 1231–1237.
Google Scholar
Godard, C., Aodha, O. M., & Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR, pp. 6602–6611.
Godard, C., Aodha, O. M., & Brostow, G. J. (2019). Digging into self-supervised monocular depth estimation. In ICCV, pp. 3827–3837.
Gu, X., Wang, Y., Wu, C., Lee, Y. J., & Wang, P. (2019). HPLFlowNet: Hierarchical permutohedral lattice flownet for scene flow estimation on large-scale point clouds. In CVPR, pp. 3249–3258 .
Hur, J., & Roth, S. (2020). Self-supervised monocular scene flow estimation. In CVPR, pp. 7394–7403.
Hur, J., & Roth, S. (2021). Self-supervised multi-frame monocular scene flow. In CVPR, pp. 2683–2693.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.
Liu, L., Zhai, G., Ye, W., & Liu, Y. (2019a). Unsupervised learning of scene flow estimation fusing with local rigidity. In IJCAI.
Liu, X., Qi, C., & Guibas, L. J. (2019b). FlowNet3D: Learning scene flow in 3d point clouds. CVPR, pp. 529–537.
Luo, C., Yang, Z., Wang, P., Wang, Y., Xu, W., Nevatia, R., & Yuille, A. (2020). Every pixel counts++: Joint learning of geometry and motion with 3D holistic understanding. IEEE TPAMI, 42, 2624–2641.
Article Google Scholar
Meister, S., Hur, J., & Roth, S. (2018). Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In AAAI.
Peng, R., Wang, R., Lai, Y., Tang, L., & Cai, Y. (2021). Excavating the potential capacity of self-supervised monocular depth estimation. In ICCV.
Qi, C., Yi, L., Su, H., & Guibas, L. J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In NIPS.
Ramamonjisoa, M., Firman, M., Watson, J., Lepetit, V., & Turmukhambetov, D. (2021). Single image depth prediction with wavelet decomposition. In CVPR, pp. 11084–11093.
Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., Black, M. J. (2019). Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In CVPR, pp. 12232–12241.
Schuster, R., Unger, C., & Stricker, D. (2020). MonoComb: A sparse-to-dense combination approach for monocular scene flow. In Computer Science in Cars Symposium.
Sun, D., Yang, X., Liu, M.-Y., Kautz, J. (2018). PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In CVPR, pp. 8934–8943.
Teed, Z., & Deng, J. (2020). RAFT: Recurrent all-pairs field transforms for optical flow. In ECCV.
Teed, Z., & Deng, J. (2021). RAFT-3D: Scene flow using rigid-motion embeddings. In CVPR, pp. 8371–8380.
Valgaerts, L., Bruhn, A., Zimmer, H., Weickert, J., Stoll, C., & Theobalt, C. (2010). Joint estimation of motion, structure and geometry from stereo sequences. In ECCV.
Vedula, S., Baker, S., Rander, P., Collins, R. T., & Kanade, T. (2005). Three-dimensional scene flow. EEE TPAMI, 27, 475–480.
Article Google Scholar
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13, 600–612.
Article Google Scholar
Wang, Z., Li, S., Howard-Jenkins, H., Prisacariu, V. A., & Chen, M. (2020). FlowNet3D++: Geometric losses for deep scene flow estimation. In WACV, pp. 91–98.
Watson, J., Firman, M., Brostow, G. J., & Turmukhambetov, D. (2019). Self-supervised monocular depth hints. In ICCV, pp. 2162–2171.
Wu, W., Wang, Z., Li, Z., Liu, W., & Li, F. (2020). Pointpwc-net: Cost volume on point clouds for (self-)supervised scene flow estimation. In ECCV.
Yang, G., & Ramanan, D. (2020). Upgrading optical flow to 3D scene flow through optical expansion. In CVPR, pp. 1331–1340.
Yang, Z., Wang, P., Wang, Y., Xu, W., & Nevatia, R. (2018). Every pixel counts: Unsupervised geometry learning with holistic 3D motion understanding. In ECCV Workshops.
Yin, Z., & Shi, J. (2018). GeoNet: Unsupervised learning of dense depth, optical flow and camera pose. In CVPR, pp. 1983–1992.
Zhou, Z., Fan, X., Shi, P., & Xin, Y. (2021). R-MSFM: Recurrent multi-scale feature modulation for monocular depth estimating. In ICCV, pp. 12777–12786.
Zou, Y., Luo, Z., & Huang, J.-B. (2018). DF-Net: Unsupervised joint learning of depth and flow using cross-task consistency. In ECCV.

Download references

Acknowledgements

This paper is supported by NSFC, China (No. 62176155) and Shanghai Municipal Science and Technology Major Project, China (2021SHZDZX0102).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Bayram Bayramli & Hongtao Lu
Google Research, Cambridge, USA
Junhwa Hur
MOE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
Hongtao Lu

Authors

Bayram Bayramli
View author publications
Search author on:PubMed Google Scholar
Junhwa Hur
View author publications
Search author on:PubMed Google Scholar
Hongtao Lu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Hongtao Lu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bayramli, B., Hur, J. & Lu, H. RAFT-MSF: Self-Supervised Monocular Scene Flow Using Recurrent Optimizer. Int J Comput Vis 131, 2757–2769 (2023). https://doi.org/10.1007/s11263-023-01828-4

Download citation

Received: 29 May 2022
Accepted: 17 May 2023
Published: 24 June 2023
Version of record: 24 June 2023
Issue date: November 2023
DOI: https://doi.org/10.1007/s11263-023-01828-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RAFT-MSF: Self-Supervised Monocular Scene Flow Using Recurrent Optimizer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic and Optical Flow Guided Self-supervised Monocular Depth and Ego-Motion Estimation

Enhancing Semantic-Guided Self-supervised Monocular Depth Estimation by Exploring Task-Related Representations

An Optical Flow-Based Solution to the Problem of Range Identification in Perspective Vision Systems

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now