Abstract
Recent optical flow estimators usually employ deep models designed for image classification as the encoders for feature extraction and matching. However, those encoders developed for image classification may be sub-optimal for flow estimation. In contrast, the decoder design of optical flow estimators often requires meticulous design for flow estimation. The disconnect between the encoder and decoder could negatively affect optical flow estimation. To address this issue, we propose a neural architecture search method, FlowNAS, to automatically find the more suitable and stronger encoder architecture for existing flow decoders. We first design a suitable search space, including various convolutional operators, and construct a weight-sharing super-network for efficiently evaluating the candidate architectures. To better train the super-network, we present a Feature Alignment Distillation module that utilizes a well-trained flow estimator to guide the training of the super-network. Finally, a resource-constrained evolutionary algorithm is exploited to determine an optimal architecture (i.e., sub-network). Experimental results show that FlowNAS can be easily incorporated into existing flow estimators and achieves state-of-the-art performance with the trade-off between accuracy and efficiency. Furthermore, the encoder architecture discovered by FlowNAS with the weights inherited from the super-network achieves 4.67% F1-all error on KITTI, an 8.4% reduction of RAFT baseline, surpassing state-of-the-art handcrafted GMA and AGFlow models, while reducing the model complexity and latency. The source code and trained models will be released at https://github.com/VDIGPKU/FlowNAS.
Similar content being viewed by others
References
Bailer, C., Taetz, B., Stricker, D. (2015). Flow fields: Dense correspondence fields for highly accurate large displacement optical flow estimation. In IEEE International Conference on Computer Vision (ICCV), pp 4015–4023
Bender, G., Kindermans, P., Zoph, B., Vasudevan, V., Le, Q. V. (2018). Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning (ICML)
Biswas, B., Kr Ghosh, S., Hore, M., & Ghosh, A. (2022). Sift-based visual tracking using optical flow and belief propagation algorithm. The Computer Journal, 65(1), 1–17.
Brock, A., Lim, T., Ritchie, J. M., Weston, N. (2018). SMASH: one-shot model architecture search through hypernetworks. In International Conference on Learning Representations (ICLR)
Butler, D. J., Wulff, J., Stanley, G. B., Black, M. J. (2012) A naturalistic open source movie for optical flow evaluation. In European Conference on Computer Vision (ECCV), pp 611–625
Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S. (2020). Once-for-all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations (ICLR)
Cai, H., Zhu, L., Han, S. (2019). Proxylessnas: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations (ICLR)
Chen, Y., Guo, Y., Chen, Q., Li, M., Zeng, W., Wang, Y., Tan, M. (2021). Contrastive neural architecture search with neural architecture comparators. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Cheng, X., Zhong, Y., Harandi, M., Dai, Y., Chang, X., Li, H., Drummond, T., Ge, Z. (2020). Hierarchical neural architecture search for deep stereo matching. In Neural Information Processing Systems (NeurIPS)
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1800–1807
Chu, X., Zhang, B., Xu, R., Li, J. (2021). Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. IEEE International Conference on Computer Vision (ICCV)
Chu, X., Zhou, T., Zhang, B., Li, J. (2020). Fair DARTS: eliminating unfair advantages in differentiable architecture search. In European Conference on Computer Vision (ECCV)
de Jong, D., Paredes-Vallés, F., de Croon, G. (5555). How do neural networks estimate optical flow a neuropsychology-inspired study. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI) pp 1–1
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In IEEE International Conference on Computer Vision (ICCV), pp 2758–2766
Fortun, D., Bouthemy, P., & Kervrann, C. (2015). Optical flow modeling and computation: A survey. Computer Vision and Image Understanding (CVIU), 134, 1–21.
Gao, S., Huang, F., Cai, W., Huang, H. (2021). Network pruning via performance maximization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231–1237.
Gou, J., Yu, B., Maybank, S. J., Tao, D. (2021). Knowledge distillation: A survey. International Journal on Computer Vision (IJCV)
Guo, Y., Zheng, Y., Tan, M., Chen, Q., Li, Z., Chen, J., Zhao, P., Huang, J. (2022). Towards accurate and compact architectures via neural architecture transformer. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI)
Guo, Z., Zhang, X., Mu, H., Heng, W., Liu, Z., Wei, Y., Sun, J. (2020). Single path one-shot neural architecture search with uniform sampling. In: European Conference on Computer Vision (ECCV)
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
Hui, T. W., Tang, X., Loy, C. C. (2019). A lightweight optical flow cnn–revisiting data fidelity and regularization. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI)
Hur, J., Roth, S. (2019). Iterative residual refinement for joint optical flow and occlusion estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5754–5763
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2462–2470
Jiang, H., Learned-Miller, E. G. (2021). Dcvnet: Dilated cost volume networks for fast optical flow. arXiv:2103.17271
Jiang, S., Campbell, D., Lu, Y., Li, H., Hartley, R. (2021a). Learning to estimate hidden motions with global motion aggregation. In IEEE International Conference on Computer Vision (ICCV), pp 9772–9781
Jiang, S., Lu, Y., Li, H., Hartley, R. (2021b). Learning optical flow from a few matches. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 16592–16600
Kondermann, D., Nair, R., Honauer, K., Krispin, K., Andrulis, J., Brock, A., Gussefeld, B., Rahimimoghaddam, M., Hofmann, S., Brenner, C., et al. (2016). The hci benchmark suite: Stereo and flow ground truth with uncertainties for urban autonomous driving. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 19–28
Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images
Li, C., Peng, J., Yuan, L., Wang, G., Liang, X., Lin, L., Chang, X. (2020a). Block-wisely supervised neural architecture search with knowledge distillation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Li, R., Tan, R. T., Cheong, L. (2020b). All in one bad weather removal using architectural search. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3172–3182
Liang, T., Wang, Y., Tang, Z., Hu, G., Ling, H. (2021). OPANAS: one-shot path aggregation network architecture search for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10195–10203
Liu, C., Chen, L., Schroff, F., Adam, H., Hua, W., Yuille, A. L., Fei-Fei, L. (2019a). Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 82–92
Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L., Fei-Fei, L., Yuille, A. L., Huang, J., & Murphy, K. (2018). Progressive neural architecture search. European Conference on Computer Vision (ECCV), 11205, 19–35.
Liu, H., Simonyan, K., Vinyals, O., Fernando, C., Kavukcuoglu, K. (2018b). Hierarchical representations for efficient architecture search. In International Conference on Learning Representations (ICLR)
Liu ,H., Simonyan, K., Yang, Y. (2019b). DARTS: differentiable architecture search. In International Conference on Learning Representations (ICLR)
Liu, J., Zhuang, B., Zhuang, Z., Guo, Y., Huang, J., Zhu, J., Tan, M. (2022). Discrimination-aware network pruning for deep model compression. IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI)
Liu, R., Ma, L., Zhang, J., Fan, X., Luo, Z. (2021). Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10561–10570
Luo, A., Yang, F., Luo, K., Li, X., Fan, H., Liu, S. (2022), Learning optical flow with adaptive graph reasoning. In Association for the Advancement of Artificial Intelligence (AAAI)
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4040–4048
Menze, M., Geiger, A. (2015). Object scene flow for autonomous vehicles. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3061–3070
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J. (2018). Efficient neural architecture search via parameters sharing. In International Conference on Machine Learning (ICML)
Ranjan, A., Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4161–4170
Real, E., Aggarwal, A., Huang, Y., Le, Q. V. (2019). Regularized evolution for image classifier architecture search. In Association for the Advancement of Artificial Intelligence (AAAI), pp 4780–4789
Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., Bengio, Y. (2015). Fitnets: Hints for thin deep nets. In International Conference on Learning Representations (ICLR)
Saikia, T., Marrakchi, Y., Zela, A., Hutter, F., Brox, T. (2019). Autodispnet: Improving disparity estimation with automl. In IEEE International Conference on Computer Vision (ICCV)
Schuster, R., Bailer, C., Wasenmüller, O., Stricker, D. (2018). Flowfields++: Accurate optical flow correspondences meet robust interpolation. In IEEE International Conference on Image Processing (ICIP), pp 1463–1467
Sun, D., Yang, X., Liu, MY., Kautz, J. (2018a). Models matter, so does training: An empirical study of cnns for optical flow estimation. arXiv preprint arXiv:1809.05571
Sun, D., Yang, X., Liu, M. Y., Kautz, J. (2018b). Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8934–8943
Sun, S., Kuang, Z., Sheng, L., Ouyang, W., Zhang, W. (2018c). Optical flow guided feature: A fast and robust motion representation for video action recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Tan, C., Li, C., He, D., Song, H. (2022). Towards real-time tracking and counting of seedlings with a one-stage detector and optical flow. Computers and Electronics in Agriculture p 106683
Tan, M., Pang, R., Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10778–10787
Teed, Z., Deng, J. (2020). RAFT: recurrent all-pairs field transforms for optical flow. In Vedaldi A, Bischof H, Brox T, Frahm J (eds) European Conference on Computer Vision (ECCV), pp 402–419
Wang, D., Li, M., Gong, C., Chandra, V. (2021). Attentivenas: Improving neural architecture search via attentive sampling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6418–6427
Wang, X., Girshick, R. B., Gupta, He, K. (2018). Non-local neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7794–7803
Wulff, J., Sevilla-Lara, L., Black, M. J. (2017). Optical flow in mostly rigid scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4671–4680
Xiao, T., Yuan, J., Sun, D., Wang, Q., Zhang, X. Y., Xu, K., Yang, M. H. (2020). Learnable cost volume using the cayley representation. In European Conference on Computer Vision (ECCV), pp 483–499
Xie, S., et al. RBG (2017). Aggregated residual transformations for deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Xu, H., Zhang, J., Cai, J., Rezatofighi, H., Tao, D. (2022). Gmflow: Learning optical flow via global matching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8121–8130
Xu J, Ranftl, R., Koltun, V. (2017). Accurate optical flow via direct cost volume processing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1289–1297
Xu, Y., Wang, Y., Han, K., Tang, Y., Jui, S., Xu, C., Xu, C. (2021). Renas: Relativistic evaluation of neural architecture search. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Yang, G., Ramanan, D. (2019). Volumetric correspondence networks for optical flow. In Neural Information Processing Systems (NeurIPS), pp 793–803
Yang, Z., Li, Z., Shao, M., Shi, D., Yuan, Z., Yuan, C. (2022). Masked generative distillation. In European Conference on Computer Vision (ECCV)
Yin, Z., Darrell, T., Yu, F. (2019). Hierarchical discrete distribution decomposition for match density estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6044–6053
Yu, J., Jin, P., Liu, H., Bender, G., Kindermans, P., Tan, M., Huang, T. S., Song, X., Pang, R., & Le, Q. (2020). Bignas: Scaling up neural architecture search with big single-stage models. European Conference on Computer Vision (ECCV), 12352, 702–717.
Yu, J., Yang, L., Xu, N., Yang, J., Huang, T. S. (2019). Slimmable neural networks. In International Conference on Learning Representations (ICLR)
Yuan, F., Shou, L., Pei, J., Lin, W., Gong, M., Fu, Y., Jiang, D. (2021). Reinforced multi-teacher selection for knowledge distillation. In Association for the Advancement of Artificial Intelligence (AAAI)
Zagoruyko, S., Komodakis, N. (2017). Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In International Conference on Learning Representations (ICLR)
Zhang, F., Woodford, O. J., Prisacariu, V. A., Torr, P. H. (2021). Separable flow: Learning motion cost volumes for optical flow estimation. In IEEE International Conference on Computer Vision (ICCV), pp 10807–10817
Zhang, H., Li, Y., Chen, H., Shen, C. (2020). Memory-efficient hierarchical neural architecture search for image denoising. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3654–3663
Zhang, X., Zhou, X., Lin, M., Sun, J. (2018). Shufflenet: An extremely efficient convolutional neural network for mobile devices. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6848–6856
Zhang, Y., Qiu, Z., Liu, J., Yao, T., Liu, D., Mei, T. (2019). Customizable architecture search for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 11641–11650
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J. (2017). Pyramid scene parsing network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6230–6239
Zhao, S., Sheng, Y., Dong, Y., Chang, E. I., Xu, Y., et al. (2020). Maskflownet: Asymmetric feature matching with learnable occlusion mask. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6278–6287
Zoph, B., Le, Q. V. (2017). Neural architecture search with reinforcement learning. In International Conference on Learning Representations (ICLR)
Acknowledgements
This work was supported in part by National Natural Science Foundation of China under Grant 62176007. This work was also a research outcome of Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jianfei Cai.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lin, Z., Liang, T., Xiao, T. et al. FlowNAS: Neural Architecture Search for Optical Flow Estimation. Int J Comput Vis 132, 1055–1074 (2024). https://doi.org/10.1007/s11263-023-01920-9
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-023-01920-9