这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

FlowNAS: Neural Architecture Search for Optical Flow Estimation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Recent optical flow estimators usually employ deep models designed for image classification as the encoders for feature extraction and matching. However, those encoders developed for image classification may be sub-optimal for flow estimation. In contrast, the decoder design of optical flow estimators often requires meticulous design for flow estimation. The disconnect between the encoder and decoder could negatively affect optical flow estimation. To address this issue, we propose a neural architecture search method, FlowNAS, to automatically find the more suitable and stronger encoder architecture for existing flow decoders. We first design a suitable search space, including various convolutional operators, and construct a weight-sharing super-network for efficiently evaluating the candidate architectures. To better train the super-network, we present a Feature Alignment Distillation module that utilizes a well-trained flow estimator to guide the training of the super-network. Finally, a resource-constrained evolutionary algorithm is exploited to determine an optimal architecture (i.e., sub-network). Experimental results show that FlowNAS can be easily incorporated into existing flow estimators and achieves state-of-the-art performance with the trade-off between accuracy and efficiency. Furthermore, the encoder architecture discovered by FlowNAS with the weights inherited from the super-network achieves 4.67% F1-all error on KITTI, an 8.4% reduction of RAFT baseline, surpassing state-of-the-art handcrafted GMA and AGFlow models, while reducing the model complexity and latency. The source code and trained models will be released at https://github.com/VDIGPKU/FlowNAS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Bailer, C., Taetz, B., Stricker, D. (2015). Flow fields: Dense correspondence fields for highly accurate large displacement optical flow estimation. In IEEE International Conference on Computer Vision (ICCV), pp 4015–4023

  • Bender, G., Kindermans, P., Zoph, B., Vasudevan, V., Le, Q. V. (2018). Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning (ICML)

  • Biswas, B., Kr Ghosh, S., Hore, M., & Ghosh, A. (2022). Sift-based visual tracking using optical flow and belief propagation algorithm. The Computer Journal, 65(1), 1–17.

    Article  MathSciNet  Google Scholar 

  • Brock, A., Lim, T., Ritchie, J. M., Weston, N. (2018). SMASH: one-shot model architecture search through hypernetworks. In International Conference on Learning Representations (ICLR)

  • Butler, D. J., Wulff, J., Stanley, G. B., Black, M. J. (2012) A naturalistic open source movie for optical flow evaluation. In European Conference on Computer Vision (ECCV), pp 611–625

  • Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S. (2020). Once-for-all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations (ICLR)

  • Cai, H., Zhu, L., Han, S. (2019). Proxylessnas: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations (ICLR)

  • Chen, Y., Guo, Y., Chen, Q., Li, M., Zeng, W., Wang, Y., Tan, M. (2021). Contrastive neural architecture search with neural architecture comparators. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  • Cheng, X., Zhong, Y., Harandi, M., Dai, Y., Chang, X., Li, H., Drummond, T., Ge, Z. (2020). Hierarchical neural architecture search for deep stereo matching. In Neural Information Processing Systems (NeurIPS)

  • Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1800–1807

  • Chu, X., Zhang, B., Xu, R., Li, J. (2021). Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. IEEE International Conference on Computer Vision (ICCV)

  • Chu, X., Zhou, T., Zhang, B., Li, J. (2020). Fair DARTS: eliminating unfair advantages in differentiable architecture search. In European Conference on Computer Vision (ECCV)

  • de Jong, D., Paredes-Vallés, F., de Croon, G. (5555). How do neural networks estimate optical flow a neuropsychology-inspired study. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI) pp 1–1

  • Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In IEEE International Conference on Computer Vision (ICCV), pp 2758–2766

  • Fortun, D., Bouthemy, P., & Kervrann, C. (2015). Optical flow modeling and computation: A survey. Computer Vision and Image Understanding (CVIU), 134, 1–21.

    Article  Google Scholar 

  • Gao, S., Huang, F., Cai, W., Huang, H. (2021). Network pruning via performance maximization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  • Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231–1237.

    Article  Google Scholar 

  • Gou, J., Yu, B., Maybank, S. J., Tao, D. (2021). Knowledge distillation: A survey. International Journal on Computer Vision (IJCV)

  • Guo, Y., Zheng, Y., Tan, M., Chen, Q., Li, Z., Chen, J., Zhao, P., Huang, J. (2022). Towards accurate and compact architectures via neural architecture transformer. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI)

  • Guo, Z., Zhang, X., Mu, H., Heng, W., Liu, Z., Wei, Y., Sun, J. (2020). Single path one-shot neural architecture search with uniform sampling. In: European Conference on Computer Vision (ECCV)

  • He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

  • Hui, T. W., Tang, X., Loy, C. C. (2019). A lightweight optical flow cnn–revisiting data fidelity and regularization. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI)

  • Hur, J., Roth, S. (2019). Iterative residual refinement for joint optical flow and occlusion estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5754–5763

  • Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2462–2470

  • Jiang, H., Learned-Miller, E. G. (2021). Dcvnet: Dilated cost volume networks for fast optical flow. arXiv:2103.17271

  • Jiang, S., Campbell, D., Lu, Y., Li, H., Hartley, R. (2021a). Learning to estimate hidden motions with global motion aggregation. In IEEE International Conference on Computer Vision (ICCV), pp 9772–9781

  • Jiang, S., Lu, Y., Li, H., Hartley, R. (2021b). Learning optical flow from a few matches. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 16592–16600

  • Kondermann, D., Nair, R., Honauer, K., Krispin, K., Andrulis, J., Brock, A., Gussefeld, B., Rahimimoghaddam, M., Hofmann, S., Brenner, C., et al. (2016). The hci benchmark suite: Stereo and flow ground truth with uncertainties for urban autonomous driving. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 19–28

  • Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images

  • Li, C., Peng, J., Yuan, L., Wang, G., Liang, X., Lin, L., Chang, X. (2020a). Block-wisely supervised neural architecture search with knowledge distillation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  • Li, R., Tan, R. T., Cheong, L. (2020b). All in one bad weather removal using architectural search. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3172–3182

  • Liang, T., Wang, Y., Tang, Z., Hu, G., Ling, H. (2021). OPANAS: one-shot path aggregation network architecture search for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10195–10203

  • Liu, C., Chen, L., Schroff, F., Adam, H., Hua, W., Yuille, A. L., Fei-Fei, L. (2019a). Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 82–92

  • Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L., Fei-Fei, L., Yuille, A. L., Huang, J., & Murphy, K. (2018). Progressive neural architecture search. European Conference on Computer Vision (ECCV), 11205, 19–35.

    Google Scholar 

  • Liu, H., Simonyan, K., Vinyals, O., Fernando, C., Kavukcuoglu, K. (2018b). Hierarchical representations for efficient architecture search. In International Conference on Learning Representations (ICLR)

  • Liu ,H., Simonyan, K., Yang, Y. (2019b). DARTS: differentiable architecture search. In International Conference on Learning Representations (ICLR)

  • Liu, J., Zhuang, B., Zhuang, Z., Guo, Y., Huang, J., Zhu, J., Tan, M. (2022). Discrimination-aware network pruning for deep model compression. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI)

  • Liu, R., Ma, L., Zhang, J., Fan, X., Luo, Z. (2021). Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10561–10570

  • Luo, A., Yang, F., Luo, K., Li, X., Fan, H., Liu, S. (2022), Learning optical flow with adaptive graph reasoning. In Association for the Advancement of Artificial Intelligence (AAAI)

  • Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4040–4048

  • Menze, M., Geiger, A. (2015). Object scene flow for autonomous vehicles. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3061–3070

  • Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J. (2018). Efficient neural architecture search via parameters sharing. In International Conference on Machine Learning (ICML)

  • Ranjan, A., Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4161–4170

  • Real, E., Aggarwal, A., Huang, Y., Le, Q. V. (2019). Regularized evolution for image classifier architecture search. In Association for the Advancement of Artificial Intelligence (AAAI), pp 4780–4789

  • Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., Bengio, Y. (2015). Fitnets: Hints for thin deep nets. In International Conference on Learning Representations (ICLR)

  • Saikia, T., Marrakchi, Y., Zela, A., Hutter, F., Brox, T. (2019). Autodispnet: Improving disparity estimation with automl. In IEEE International Conference on Computer Vision (ICCV)

  • Schuster, R., Bailer, C., Wasenmüller, O., Stricker, D. (2018). Flowfields++: Accurate optical flow correspondences meet robust interpolation. In IEEE International Conference on Image Processing (ICIP), pp 1463–1467

  • Sun, D., Yang, X., Liu, MY., Kautz, J. (2018a). Models matter, so does training: An empirical study of cnns for optical flow estimation. arXiv preprint arXiv:1809.05571

  • Sun, D., Yang, X., Liu, M. Y., Kautz, J. (2018b). Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8934–8943

  • Sun, S., Kuang, Z., Sheng, L., Ouyang, W., Zhang, W. (2018c). Optical flow guided feature: A fast and robust motion representation for video action recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  • Tan, C., Li, C., He, D., Song, H. (2022). Towards real-time tracking and counting of seedlings with a one-stage detector and optical flow. Computers and Electronics in Agriculture p 106683

  • Tan, M., Pang, R., Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10778–10787

  • Teed, Z., Deng, J. (2020). RAFT: recurrent all-pairs field transforms for optical flow. In Vedaldi A, Bischof H, Brox T, Frahm J (eds) European Conference on Computer Vision (ECCV), pp 402–419

  • Wang, D., Li, M., Gong, C., Chandra, V. (2021). Attentivenas: Improving neural architecture search via attentive sampling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6418–6427

  • Wang, X., Girshick, R. B., Gupta, He, K. (2018). Non-local neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7794–7803

  • Wulff, J., Sevilla-Lara, L., Black, M. J. (2017). Optical flow in mostly rigid scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4671–4680

  • Xiao, T., Yuan, J., Sun, D., Wang, Q., Zhang, X. Y., Xu, K., Yang, M. H. (2020). Learnable cost volume using the cayley representation. In European Conference on Computer Vision (ECCV), pp 483–499

  • Xie, S., et al. RBG (2017). Aggregated residual transformations for deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  • Xu, H., Zhang, J., Cai, J., Rezatofighi, H., Tao, D. (2022). Gmflow: Learning optical flow via global matching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8121–8130

  • Xu J, Ranftl, R., Koltun, V. (2017). Accurate optical flow via direct cost volume processing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1289–1297

  • Xu, Y., Wang, Y., Han, K., Tang, Y., Jui, S., Xu, C., Xu, C. (2021). Renas: Relativistic evaluation of neural architecture search. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  • Yang, G., Ramanan, D. (2019). Volumetric correspondence networks for optical flow. In Neural Information Processing Systems (NeurIPS), pp 793–803

  • Yang, Z., Li, Z., Shao, M., Shi, D., Yuan, Z., Yuan, C. (2022). Masked generative distillation. In European Conference on Computer Vision (ECCV)

  • Yin, Z., Darrell, T., Yu, F. (2019). Hierarchical discrete distribution decomposition for match density estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6044–6053

  • Yu, J., Jin, P., Liu, H., Bender, G., Kindermans, P., Tan, M., Huang, T. S., Song, X., Pang, R., & Le, Q. (2020). Bignas: Scaling up neural architecture search with big single-stage models. European Conference on Computer Vision (ECCV), 12352, 702–717.

    Google Scholar 

  • Yu, J., Yang, L., Xu, N., Yang, J., Huang, T. S. (2019). Slimmable neural networks. In International Conference on Learning Representations (ICLR)

  • Yuan, F., Shou, L., Pei, J., Lin, W., Gong, M., Fu, Y., Jiang, D. (2021). Reinforced multi-teacher selection for knowledge distillation. In Association for the Advancement of Artificial Intelligence (AAAI)

  • Zagoruyko, S., Komodakis, N. (2017). Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In International Conference on Learning Representations (ICLR)

  • Zhang, F., Woodford, O. J., Prisacariu, V. A., Torr, P. H. (2021). Separable flow: Learning motion cost volumes for optical flow estimation. In IEEE International Conference on Computer Vision (ICCV), pp 10807–10817

  • Zhang, H., Li, Y., Chen, H., Shen, C. (2020). Memory-efficient hierarchical neural architecture search for image denoising. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3654–3663

  • Zhang, X., Zhou, X., Lin, M., Sun, J. (2018). Shufflenet: An extremely efficient convolutional neural network for mobile devices. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6848–6856

  • Zhang, Y., Qiu, Z., Liu, J., Yao, T., Liu, D., Mei, T. (2019). Customizable architecture search for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 11641–11650

  • Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J. (2017). Pyramid scene parsing network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6230–6239

  • Zhao, S., Sheng, Y., Dong, Y., Chang, E. I., Xu, Y., et al. (2020). Maskflownet: Asymmetric feature matching with learnable occlusion mask. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6278–6287

  • Zoph, B., Le, Q. V. (2017). Neural architecture search with reinforcement learning. In International Conference on Learning Representations (ICLR)

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China under Grant 62176007. This work was also a research outcome of Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongtao Wang.

Additional information

Communicated by Jianfei Cai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, Z., Liang, T., Xiao, T. et al. FlowNAS: Neural Architecture Search for Optical Flow Estimation. Int J Comput Vis 132, 1055–1074 (2024). https://doi.org/10.1007/s11263-023-01920-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-023-01920-9

Keywords