Abstract
Single object tracking (SOT) is a fundamental problem in computer vision, with a wide range of applications, including autonomous driving, augmented reality, and robot navigation. The robustness of SOT faces two main challenges: tiny target and fast motion. These challenges are especially manifested in videos captured by unmanned aerial vehicles (UAV), where the target is usually far away from the camera and often with significant motion relative to the camera. To evaluate the robustness of SOT methods, we propose BioDrone—the first bionic drone-based visual benchmark for SOT. Unlike existing UAV datasets, BioDrone features videos captured from a flapping-wing UAV system with a major camera shake due to its aerodynamics. BioDrone hence highlights the tracking of tiny targets with drastic changes between consecutive frames, providing a new robust vision benchmark for SOT. To date, BioDrone offers the largest UAV-based SOT benchmark with high-quality fine-grained manual annotations and automatically generates frame-level labels, designed for robust vision analyses. Leveraging our proposed BioDrone, we conduct a systematic evaluation of existing SOT methods, comparing the performance of 20 representative models and studying novel means of optimizing a SOTA method (KeepTrack Mayer et al. in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 13444–13454, 2021) for robust SOT. Our evaluation leads to new baselines and insights for robust SOT. Moving forward, we hope that BioDrone will not only serve as a high-quality benchmark for robust SOT, but also invite future research into robust computer vision. The database, toolkits, evaluation server, and baseline results are available at http://biodrone.aitestunion.com.
Similar content being viewed by others
Availability of data and materials.
All data will be made available on reasonable request.
Notes
References
Abu Alhaija, H., Mustikovela, S. K., Mescheder, L., Geiger, A., & Rother, C. (2018). Augmented reality meets computer vision: Efficient data generation for urban driving scenes. International Journal of Computer Vision, 126(9), 961–972.
Barrientos, A., Colorado, J., Martinez, A., & Valente, J. (2010). Rotary-wing mav modeling and control for indoor scenarios. In 2010 IEEE international conference on industrial technology (pp. 1475–1480). IEEE.
Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. (2016). Fully-convolutional siamese networks for object tracking. In European conference on computer vision (pp. 850–865). Springer.
Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2019). Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6182–6191).
Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2020). Know your surroundings: Exploiting scene information for object tracking. In European conference on computer vision (pp. 205–221). Springer.
Bondi, E., Dey, D., Kapoor, A., Piavis, J., Shah, S., Fang, F., Dilkina, B., Hannaford, R., Iyer, A., Joppa, L., et al. (2018). Airsim-w: A simulation environment for wildlife conservation with uavs. In Proceedings of the 1st ACM SIGCAS conference on computing and sustainable societies (pp. 1–12).
Bondi, E., Jain, R., Aggrawal, P., Anand, S., Hannaford, R., Kapoor, A., Piavis, J., Shah, S., Joppa, L., & Dilkina, B., et al. (2020). Birdsai: A dataset for detection and tracking in aerial thermal infrared videos. In Proceedings of the IEEE/CVF Winter conference on applications of computer vision (pp. 1747–1756).
Cao, Z., Huang, Z., Pan, L., Zhang, S., Liu, Z., & Fu, C. (2022). Tctrack: Temporal contexts for aerial tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14798–14808).
Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531.
Cui, Y., Jiang, C., Wang, L., & Wu, G. (2022). Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13608–13618).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) (Vol. 1, pp. 886–893). IEEE.
Danelljan, M., Bhat, G., Khan, F. S., & Felsberg, M. (2019). Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4660–4669).
Danelljan, M., Bhat, G., Shahbaz Khan, F., & Felsberg, M. (2017). Eco: Efficient convolution operators for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6638–6646).
Danelljan, M., Gool, L. V., & Timofte, R. (2020). Probabilistic regression for visual tracking. In 2020 IEEE conference on computer vision and pattern recognition (CVPR).
De Croon, G., Perçin, M., Remes, B., Ruijsink, R., & De Wagter, C. (2016). The delfly (pp. 978–94). Dordrecht: Springer.
Dendorfer, P., Osep, A., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S., & Leal-Taixé, L. (2021). Motchallenge: A benchmark for single-camera multiple target tracking. International Journal of Computer Vision, 129(4), 845–881.
DeTone, D., Malisiewicz, T., & Rabinovich, A. (2018). Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 224–236).
Dupeyroux, J., Serres, J. R., & Viollet, S. (2019). Antbot: A six-legged walking robot able to home like desert ants in outdoor environments. Science Robotics, 4(27), eaau0307.
Fan, H., Bai, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Huang, M., Liu, J., Xu, Y., et al. (2021). Lasot: A high-quality large-scale single object tracking benchmark. International Journal of Computer Vision, 129(2), 439–461.
Finlayson, G. D., & Trezzi, E. (2004). Shades of gray and colour constancy. In The twelfth color imaging conference 2004 (pp. 37–41).
Fraire, A. E., Morado, R. P., López, A. D., & Leal, R. L. (2015). Design and implementation of fixed-wing mav controllers. In 2015 Workshop on research, education and development of unmanned aerial systems (RED-UAS) (pp. 172–179). IEEE.
Gauglitz, S., Höllerer, T., & Turk, M. (2011). Evaluation of interest point detectors and feature descriptors for visual tracking. International Journal of Computer Vision, 94(3), 335–360.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).
Guo, D., Wang, J., Cui, Y., Wang, Z., & Chen, S. (2020). Siamcar: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6269–6277).
Han, L., Wang, P., Yin, Z., Wang, F., & Li, H. (2021). Context and structure mining network for video object detection. International Journal of Computer Vision, 129(10), 2927–2946.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2014). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.
Hsieh, M.-R., Lin, Y.-L., & Hsu, W. H. (2017). Drone-based object counting by spatially regularized regional proposal network. In Proceedings of the IEEE international conference on computer vision (pp. 4145–4153).
Hu, Q., Yang, B., Khalid, S., Xiao, W., Trigoni, N., & Markham, A. (2022). Sensaturban: Learning semantics from urban-scale photogrammetric point clouds. International Journal of Computer Vision, 130(2), 316–343.
Hu, S., Zhao, X. & Huang, K. (2023). SOTVerse: A user-defined task space of single object tracking. International Journal of Computer Vision.https://doi.org/10.1007/s11263-023-01908-5.
Hu, S., Zhao, X., Huang, L., & Huang, K. (2023). Global instance tracking: Locating target more like humans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), 576–592.
Huang, L., Zhao, X., & Huang, K. (2020). Globaltrack: A simple and strong baseline for long-term tracking. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 11037–11044).
Huang, L., Zhao, X., & Huang, K. (2021). Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1562–1577.
Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y. (2018). Acquisition of localization confidence for accurate object detection. In Proceedings of the European conference on computer vision (ECCV) (pp. 784–799).
Kong, Y., & Fu, Y. (2022). Human action recognition and prediction: A survey. International Journal of Computer Vision, 130(5), 1366–1401.
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Čehovin, L., Vojír, T., Häger, G., Lukežič, A., Fernández, G., Gupta, A., Petrosino, A., Memarmoghadam, A., Garcia-Martin, A., Solís Montero, A., et al. (2016). The visual object tracking VOT2016 challenge results. In Computer vision—ECCV 2016 workshops (pp. 777–823). Springer.
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., Danelljan, M., Zajc, L.Č., Lukežič, A., Drbohlav, O., He, L., et al. (2020). The eighth visual object tracking VOT2020 challenge results. In Computer vision—ECCV 2020 workshops (pp. 547–601). Springer.
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Zajc, L. Č., Vojír, T., Bhat, G., Lukežič, A., Eldesokey, A., Fernández, G., et al. (2019a). The sixth visual object tracking VOT2018 challenge results. In Computer vision—ECCV 2018 workshops (pp. 3–53). Springer.
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Zajc, L. C., Vojír, T., Häger, G., Lukežic, A., Eldesokey, A., Fernández, G., García-Martín, Á., Muhic, A., Petrosino, A., Memarmoghadam, A., et al. (2017). The visual object tracking VOT2017 challenge results. In Proceedings of 2017 IEEE international conference on computer vision workshops (ICCVW) (pp. 1949–1972). IEEE, Venice, Italy.
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernandez, G., Vojir, T., Hager, G., Nebehay, G., Pflugfelder, R., Gupta, A., Bibi, A., Lukezic, A., Garcia-Martin, A., Saffari, A., Petrosino, A., & Solis Montero, A. (2015). The visual object tracking VOT2015 challenge results. In Proceedings of 2015 IEEE international conference on computer vision workshop (ICCVW) (pp. 564–586). IEEE.
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., Cehovin Zajc, L., Drbohlav, O., Lukezic, A., Berg, A., Eldesokey, A., Käpylä, J., Fernández, G., Gonzalez-Garcia, A., Memarmoghadam, A., et al. (2019b). The seventh visual object tracking VOT2019 challenge results. In Proceedings of 2019 IEEE/CVF international conference on computer vision workshop (ICCVW) (pp. 2206–2241). IEEE, Seoul, Korea (South).
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., Chang, H. J., Danelljan, M., Zajc, L. Č., Lukežič, A., Drbohlav, O., et al. (2021). The ninth visual object tracking VOT2021 challenge results. In Proceedings of 2021 IEEE/CVF international conference on computer vision workshops (ICCVW) (pp. 2711–2738). IEEE, Montreal, BC, Canada.
Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Porikli, F., Cehovin, L., Nebehay, G., Fernandez, G., Vojir, T., Gatt, A., Khajenezhad, A., Salahledin, A., Soltani-Farani, A., et al. (2013). The visual object tracking VOT2013 challenge results. In Proceedings of 2013 IEEE international conference on computer vision workshops (ICCVW) (pp. 98–111). IEEE.
Kristan, M., Pflugfelder, R. P., Leonardis, A., Matas, J., Cehovin, L., Nebehay, G., Vojír, T., Fernández, G., Lukezic, A., Dimitriev, A., Petrosino, A., Saffari, A. A., et al. (2014). The visual object tracking VOT2014 challenge results. In L. Agapito, M. M. Bronstein, & C. Rother (Eds.), Computer vision: ECCV 2014 workshops (Vol. 8926, pp. 191–217). Springer.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems 25.
Lee, N., Lee, S., Cho, H., & Shin, S. (2018). Effect of flexibility on flapping wing characteristics in hover and forward flight. Computers & Fluids, 173, 111–117.
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., & Yan, J. (2019). Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4282–4291).
Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In The IEEE conference on computer vision and pattern recognition (CVPR).
Li, S., & Yeung, D.-Y. (2017). Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In Thirty-first AAAI conference on artificial intelligence.
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261–318.
Luiten, J., Osep, A., Dendorfer, P., Torr, P., Geiger, A., Leal-Taixé, L., & Leibe, B. (2021). Hota: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision, 129(2), 548–578.
Mayer, C., Danelljan, M., Paudel, D.P., & Van Gool, L. (2021). Learning target candidate association to keep track of what not to track. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13444–13454).
McMasters, J., & Cummings, R. (2004). Rethinking the airplane design process: An early 21st century perspective. In 42nd AIAA aerospace sciences meeting and exhibit (p. 693).
McMasters, J. H., & Cummings, R. M. (2002). Airplane design: Past, present, and future. Journal of Aircraft, 39(1), 10–17.
Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European conference on computer vision (ECCV) (pp. 300–317).
Müller, M., Casser, V., Lahoud, J., Smith, N., & Ghanem, B. (2018). Sim4cv: A photo-realistic simulator for computer vision applications. International Journal of Computer Vision, 126(9), 902–919.
Mueller, M., Smith, N., & Ghanem, B. (2016). A benchmark and simulator for uav tracking. In European conference on computer vision (pp. 445–461). Springer.
Pech-Pacheco, J. L., Cristobal, G., Chamorro-Martinez, J., & Fernandez-Valdivia, J. (2000). Diatom autofocusing in brightfield microscopy: A comparative study. In Proceedings 15th international conference on pattern recognition. ICPR-2000 (Vol. 3, pp. 314–317).
Pornsin-Sirirak, T. N., Tai, Y.-C., Ho, C.-M., & Keennon, M. (2001). Microbat: A palm-sized electrically powered ornithopter. In Proceedings of NASA/JPL workshop on biomorphic robotics (Vol. 14, p. 17). Citeseer.
Ramakrishnan, S. K., Jayaraman, D., & Grauman, K. (2021). An exploration of embodied visual exploration. International Journal of Computer Vision, 129(5), 1616–1649.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, 28.
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., & Savarese, S. (2019). Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 658–666).
Rigelsford, J. (2004). Neurotechnology for biomimetic robots. Industrial Robot: An International Journal, 31(6), 534.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Ryu, S., Kwon, U., & Kim, H. J. (2016). Autonomous flight and vision-based target tracking for a flapping-wing mav. In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 5645–5650). IEEE.
Sarlin, P.-E., DeTone, D., Malisiewicz, T., & Rabinovich, A. (2020). Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4938–4947).
Sims, C. A., & Uhlig, H. (1991). Understanding unit rooters: A helicopter tour. Econometrica: Journal of the Econometric Society, 59, 1591–1599.
Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781–10790).
Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9627–9636).
Van De Weijer, J., Schmid, C., Verbeek, J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transactions on Image Processing, 18(7), 1512–1523.
Voigtlaender, P., Luiten, J., Torr, P. H., & Leibe, B. (2020). Siam r-cnn: Visual tracking by re-detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6578–6588).
Wu, X., Li, W., Hong, D., Tao, R., & Du, Q. (2021). Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey. IEEE Geoscience and Remote Sensing Magazine, 10(1), 91–124.
Wu, Y., Lim, J., & Yang, M.-H. (2013). Online object tracking: A benchmark. In 2013 IEEE conference on computer vision and pattern recognition (pp. 2411–2418).
Wu, Y., Lim, J., & Yang, M.-H. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1834–1848.
Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., & Zhang, L. (2018). Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3974–3983).
Xu, Y., Wang, Z., Li, Z., Yuan, Y., & Yu, G. (2020). Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 12549–12556).
Yang, W., Wang, L., & Song, B. (2018). Dove: A biomimetic flapping-wing micro air vehicle. International Journal of Micro Air Vehicles, 10(1), 70–84.
Yu, H., Li, G., Zhang, W., Huang, Q., Du, D., Tian, Q., & Sebe, N. (2020). The unmanned aerial vehicle benchmark: Object detection, tracking and baseline. International Journal of Computer Vision, 128(5), 1141–1159.
Zhang, C., & Rossi, C. (2017). A review of compliant transmission mechanisms for bio-inspired flapping-wing micro air vehicles. Bioinspiration & Biomimetics, 12(2), 025005.
Zhang, Z., & Peng, H. (2019). Deeper and wider siamese networks for real-time visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4591–4600).
Zhang, Z., Peng, H., Fu, J., Li, B., & Hu, W. (2020). Ocean: Object-aware anchor-free tracking. In European conference on computer vision (pp. 771–787). Springer.
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2020). Distance-iou loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 12993–13000).
Zhu, P., Wen, L., Du, D., Bian, X., Fan, H., Hu, Q., & Ling, H. (2021). Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 7380–7399.
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., & Hu, W. (2018). Distractor-aware siamese networks for visual object tracking. In Proceedings of the European conference on computer vision (ECCV) (pp. 101–117).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
All authors declare no conflicts of interest.
Code availability
The toolkit and experimental results will be made publicly available.
Additional information
Communicated by Oliver Zendel.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, X., Hu, S., Wang, Y. et al. BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision. Int J Comput Vis 132, 1659–1684 (2024). https://doi.org/10.1007/s11263-023-01937-0
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-023-01937-0