DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes

Hao, Shengyu; Liu, Peiyuan; Zhan, Yibing; Jin, Kaixun; Liu, Zuozhu; Song, Mingli; Hwang, Jenq-Neng; Wang, Gaoang

doi:10.1007/s11263-023-01922-7

DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes

Published: 25 October 2023

Volume 132, pages 1075–1090, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Shengyu Hao^1,3^na1,
Peiyuan Liu¹^na1,
Yibing Zhan²,
Kaixun Jin¹,
Zuozhu Liu¹,
Mingli Song³,
Jenq-Neng Hwang⁴ &
…
Gaoang Wang ORCID: orcid.org/0000-0002-8403-1538^1,3

1382 Accesses
23 Citations
Explore all metrics

Abstract

Cross-view multi-object tracking aims to link objects between frames and camera views with substantial overlaps. Although cross-view multi-object tracking has received increased attention in recent years, existing datasets still have several issues, including (1) missing real-world scenarios, (2) lacking diverse scenes, (3) containing a limited number of tracks, (4) comprising only static cameras, and (5) lacking standard benchmarks, which hinder the investigation and comparison of cross-view tracking methods. To solve the aforementioned issues, we introduce DIVOTrack: a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians in realistic and non-experimental environments. Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available. Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT, which learns object detection, single-view association, and cross-view matching with an all-in-one embedding model. Finally, we present a summary of current methodologies and a set of standard benchmarks with our DIVOTrack to provide a fair comparison and conduct a comprehensive analysis of current approaches and our proposed CrossMOT. The dataset and code are available at https://github.com/shengyuhao/DIVOTrack.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Full-duplex strategy for video object segmentation

Article Open access 18 October 2022

Recent Developments in Tracking Objects in a Video Sequence

Dual-Head Feature Enhancement for Graph-Based Cross-View Multi-object Tracking

References

Athar, A., Luiten, J., Voigtlaender, P., Khurana, T., Dave, A., Leibe, B., & Ramanan, D. (2023). Burst: A benchmark for unifying object recognition, segmentation and tracking in video. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 1674–1683.
Ayazoglu, M., Li, B., Dicle, C., Sznaier, M., & Camps, O. I. (2011). Dynamic subspace-based coordinated multicamera tracking. In 2011 International conference on computer vision, pp. 2462–2469. IEEE.
Bergmann, P., Meinhardt, T., & Leal-Taixé, L. (2019). Tracking without bells and whistles. In The IEEE international conference on computer vision (ICCV).
Brasó, G., Cetintas, O., & Leal-Taixé, L. (2022). Multi-object tracking and segmentation via neural message passing. International Journal of Computer Vision, 130(12), 3035–3053.
Article Google Scholar
Cai, Y., & Medioni, G. (2014). Exploring context information for inter-camera multiple target tracking. In IEEE winter conference on applications of computer vision, pp. 761–768. IEEE.
Chavdarova, T., Baqué, P., Bouquet, S., Maksai, A., Jose, C., Bagautdinov, T., Lettry, L., Fua, P., Van Gool, L., & Fleuret, F. (2018). Wildtrack: A multi-camera hd dataset for dense unscripted pedestrian detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5030–5039.
Cheng, D., Gong, Y., Wang, J., Hou, Q., & Zheng, N. (2017). Part-aware trajectories association across non-overlapping uncalibrated cameras. Neurocomputing, 230, 30–39.
Article Google Scholar
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 6569–6578.
Fleuret, F., Berclaz, J., Lengagne, R., & Fua, P. (2007). Multicamera people tracking with a probabilistic occupancy map. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 267–282.
Article Google Scholar
Gan, Y., Han, R., Yin, L., Feng, W., & Wang, S. (2021). Self-supervised multi-view multi-human association and tracking. In Proceedings of the 29th ACM international conference on multimedia, pp. 282–290.
Han, R., Feng, W., Zhao, J., Niu, Z., Zhang, Y., Wan, L., & Wang, S. (2020). Complementary-view multiple human tracking. In AAAI conference on artificial intelligence.
Hinton, G., Vinyals, O., Dean, J., et al. (2015) Distilling the knowledge in a neural network. arXiv preprintarXiv:1503.02531, 2(7).
Hofmann, M., Wolf, D., & Rigoll, G. (2013). Hypergraphs for joint multi-view reconstruction and multi-object tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3650–3657.
Hsu, H.-M., Cai, J., Wang, Y., Hwang, J.-N., & Kim, K.-J. (2021). Multi-target multi-camera tracking of vehicles using metadata-aided re-id and trajectory-based camera link model. IEEE Transactions on Image Processing, 30, 5198–5210.
Article Google Scholar
Hsu, H.-M., Huang, T.-W., Wang, G., Cai, J., Lei, Z., & Hwang, J.-N. (2019). Multi-camera tracking of vehicles based on deep features re-id and trajectory-based camera link models. In CVPR workshops, pp. 416–424.
Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7482–7491.
Khan, S., Javed, O., Rasheed, Z., & Shah, M. (2001). Human tracking in multiple cameras. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 1, pp. 331–336. IEEE.
Khurana, T., Dave, A., & Ramanan, D. (2021). Detecting invisible people. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 3174–3184.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980.
Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1–2), 83–97.
Article MathSciNet Google Scholar
Lee, Y.-G., Tang, Z., & Hwang, J.-N. (2017). Online-learning-based human tracking across non-overlapping cameras. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2870–2883.
Article Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision, pp. 740–755. Springer.
Liu, X. (2016). Multi-view 3d human tracking in crowded scenes. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3553–3559.
Luiten, J., Osep, A., Dendorfer, P., Torr, P., Geiger, A., Leal-Taixé, L., & Leibe, B. (2020). Hota: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision, pp. 1–31.
Luo, H., Jiang, W., Gu, Y., Liu, F., Liao, X., Lai, S., & Gu, J. (2019). A strong baseline and batch normalization neck for deep person re-identification. IEEE Transactions on Multimedia, pp. 1–1.
Ma, C., Yang, F., Li, Y., Jia, H., Xie, X., & Gao, W. (2021). Deep trajectory post-processing and position projection for single & multiple camera multiple object tracking. International Journal of Computer Vision, 129, 3255–3278.
Article Google Scholar
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., & Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. arXiv preprintarXiv:1603.00831.
Reddy, N. D., Vo, M., & Narasimhan, S. G. (2019). Occlusion-net: 2d/3d occluded keypoint localization using graph networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7326–7335.
Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In European conference on computer vision, pp. 17–35. Springer.
Ristani, E., & Tomasi, C. (2018). Features for multi-target multi-camera tracking and re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6036–6046.
Tang, Z., Gu, R., & Hwang, J.-N. (2018). Joint multi-view people tracking and pose estimation for 3d scene reconstruction. In 2018 IEEE international conference on multimedia and expo (ICME), pp. 1–6. IEEE.
Tang, Z., Wang, G., Xiao, H., Zheng, A., & Hwang, J.-N. (2018). Single-camera and inter-camera vehicle tracking and 3d speed estimation based on fusion of visual and semantic features. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 108–115.
Tesfaye, Y. T., Zemene, E., Prati, A., Pelillo, M., & Shah, M. (2017). Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprintarXiv:1706.06196.
Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B. B. G., Geiger, A., & Leibe, B. (2019). Mots: Multi-object tracking and segmentation. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp. 7942–7951.
Wang, G., Gu, R., Liu, Z., Hu, W., Song, M., & Hwang, J.-N. (2021). Track without appearance: Learn box and tracklet embedding with local and global motion patterns for vehicle tracking. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9876–9886.
Wang, G., Song, M., & Hwang, J.-N. (2022). Recent advances in embedding methods for multi-object tracking: A survey. arXiv preprintarXiv:2205.10766.
Wang, G., Wang, Y., Gu, R., Hu, W., & Hwang, J.-N. (2022). Split and connect: A universal tracklet booster for multi-object tracking. IEEE Transactions on Multimedia.
Wang, G., Wang, Y., Zhang, H., Gu, R., & Hwang, J.-N. (2019). Exploit the connectivity: Multi-object tracking with trackletnet. In Proceedings of the 27th ACM international conference on multimedia, pp. 482–490.
Wang, G., Yuan, Y., Chen, X., Li, J., & Zhou, X. (2018). Learning discriminative features with multiple granularities for person re-identification. In Proceedings of the 26th ACM international conference on Multimedia, pp. 274–282.
Wang, Z., Zheng, L., Liu, Y., Li, Y., & Wang, S. (2020). Towards real-time multi-object tracking. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pp. 107–122. Springer.
Wieczorek, M., Rychalska, B., & Dąbrowski, J. (2021). On the unreasonable effectiveness of centroids in image retrieval. In International Conference on Neural Information Processing, pp. 212–223. Springer.
Wojke, N., Bewley, A., & Paulus, D. (2017). Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP), pp. 3645–3649. IEEE.
Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., & Yuan, J. (2021). Track to detect and segment: An online multi-object tracker. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Xu, Y., Liu, X., Liu, Y., & Zhu, S.-C. (2016). Multi-view people tracking via hierarchical trajectory composition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4256–4265.
Xu, Y., Liu, X., Qin, L., & Zhu, S.-C. (2017). Cross-view people tracking by scene-centered spatio-temporal parsing. In AAAI, pp. 4299–4305.
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., & Hoi, S. C. H. (2021). Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Zhang, Y., Wang, C., Wang, X., Zeng, W., & Liu, W. (2021). Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 129(11), 3069–3087.
Article Google Scholar
Zhang, Z., Wu, J., Zhang, X., & Zhang, C. (2017). Multi-target, multi-camera tracking by hierarchical clustering: Recent progress on dukemtmc project. arXiv preprintarXiv:1712.09531.
Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2019). Omni-scale feature learning for person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 3702–3712.
Zhou, X., Koltun, V., & Krähenbühl, P. (2020). Tracking objects as points. In European Conference on Computer Vision (ECCV), pp. 474–490. Springer.
Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as points. arXiv preprintarXiv:1904.07850.

Download references

Acknowledgements

The authors would also like to thank Tianqi Liu, Zining Ge, Kuangji Chen, Xubin Qiu, Shitian Yang, Jiahao Wei, Yuhao Ge, Hao Chen, Bingqi Yang, Kaixun Jin, Zeduo Yu and Donglin Gu for their work on the dataset collection and annotation. This work is supported by the Fundamental Research Funds for the Central Universities No.226-2023-00045, National Key R &D Program of China under Grant No.2022ZD0162000, and National Natural Science Foundation of China No.62106219.

Author information

Shengyu Hao and Peiyuan Liu have authors equally contributed to this work.

Authors and Affiliations

ZJU-UIUC Institute, Zhejiang University, Haining, China
Shengyu Hao, Peiyuan Liu, Kaixun Jin, Zuozhu Liu & Gaoang Wang
JD Explore Academy, Beijing, China
Yibing Zhan
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Shengyu Hao, Mingli Song & Gaoang Wang
Department of Electrical and Computer Engineering, University of Washington, Seattle, USA
Jenq-Neng Hwang

Authors

Shengyu Hao
View author publications
Search author on:PubMed Google Scholar
Peiyuan Liu
View author publications
Search author on:PubMed Google Scholar
Yibing Zhan
View author publications
Search author on:PubMed Google Scholar
Kaixun Jin
View author publications
Search author on:PubMed Google Scholar
Zuozhu Liu
View author publications
Search author on:PubMed Google Scholar
Mingli Song
View author publications
Search author on:PubMed Google Scholar
Jenq-Neng Hwang
View author publications
Search author on:PubMed Google Scholar
Gaoang Wang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Gaoang Wang.

Additional information

Communicated by D. Scharstein.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hao, S., Liu, P., Zhan, Y. et al. DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes. Int J Comput Vis 132, 1075–1090 (2024). https://doi.org/10.1007/s11263-023-01922-7

Download citation

Received: 09 February 2023
Accepted: 26 September 2023
Published: 25 October 2023
Version of record: 25 October 2023
Issue date: April 2024
DOI: https://doi.org/10.1007/s11263-023-01922-7

Keywords

Access this article

Log in via an institution

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes

Abstract

Access this article

Similar content being viewed by others

Full-duplex strategy for video object segmentation

Recent Developments in Tracking Objects in a Video Sequence

Dual-Head Feature Enhancement for Graph-Based Cross-View Multi-object Tracking

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords