METS: Motion-Encoded Time-Surface for Event-Based High-Speed Pose Tracking

Xu, Ninghui; Wang, Lihui; Yao, Zhiting; Okatani, Takayuki

doi:10.1007/s11263-025-02379-6

METS: Motion-Encoded Time-Surface for Event-Based High-Speed Pose Tracking

Published: 05 March 2025

Volume 133, pages 4401–4419, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

760 Accesses
Explore all metrics

Abstract

We present a novel event-based representation, named Motion-Encoded Time-Surface (METS), and how it can be used to address the challenge of pose tracking under high-speed scenarios with an event camera. The core concept is dynamically encoding the pixel-wise decay rate of the Time-Surface to account for localized spatio-temporal scene dynamics captured by events, rendering remarkable adaptability with respect to motion dynamics. The consistency between METS and the scene in highly dynamic conditions establishes a reliable foundation for robust pose estimation. Building upon this, we employ a semi-dense 3D-2D alignment pipeline to fully unlock the potential of the event camera for high-speed tracking applications. Given the intrinsic characteristics of METS, we further develop specialized lightweight operations aimed at minimizing the per-event computational cost. The proposed algorithm is successfully evaluated on public datasets and our high-speed motion datasets covering various scenes and motion complexities. It shows that our approach outperforms state-of-the-art pose tracking methods, especially in highly dynamic scenarios, and is capable of tracking accurately under incredibly fast motions that are inaccessible for other event- or frame-based counterparts. Due to its simplicity, our algorithm exhibits outstanding practicality, running at over 70 Hz on a standard CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking

Event-Based Motion Magnification

System Design and Implementation of Particle Filter Algorithm Combined with Mean Shift in High-Precision Event Camera Positioning

Data Availability

The supplementary will be released and available after acceptance.

References

Alzugaray, I., & Chli, M. (2018). Ace: An efficient asynchronous corner tracker for event cameras. In: 2018 International Conference on 3D Vision (3DV), pp. 653–661. IEEE.
Brandli, C., Berner, R., Yang, M., Liu, S.-C., & Delbruck, T. (2014). A $240\times 180$ 130 db $3\mu s$ latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49(10), 2333–2341.
Article Google Scholar
Benosman, R., Clercq, C., Lagorce, X., Ieng, S.-H., & Bartolozzi, C. (2013). Event-based visual flow. IEEE Transactions on Neural Networks and Learning Systems, 25(2), 407–417.
Article Google Scholar
Bryner, S., Gallego, G., Rebecq, H., & Scaramuzza, D. (2019). Event-based, direct camera tracking from a photometric 3d map using nonlinear optimization. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 325–331. IEEE.
Baldwin, R. W., Liu, R., Almatrafi, M., Asari, V., & Hirakawa, K. (2022). Time-ordered recent event (tore) volumes for event cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 2519–2532.
Article Google Scholar
Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56, 221–255.
Article Google Scholar
Cannici, M., Ciccone, M., Romanoni, A., & Matteucci, M. (2020). A differentiable recurrent surface for asynchronous event-based data. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pp. 136–152. Springer.
Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M., & Tardós, J. D. (2021). Orb-slam3: An accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Transactions on Robotics, 37(6), 1874–1890.
Article Google Scholar
Casiez, G., Roussel, N., & Vogel, D. (2012). 1€filter: A simple speed-based low-pass filter for noisy input in interactive systems. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2527–2530.
Dai, B., Le Gentil, C., & Vidal-Calleja, T. (2022). A tightly-coupled event-inertial odometry using exponential decay and linear preintegrated measurements. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9475–9482. IEEE.
Forster, C., Pizzoli, M., & Scaramuzza, D. (2014). Svo: Fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE.
Gallego, G., Delbrück, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A. J., Conradt, J., & Daniilidis, K. (2020). Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1), 154–180.
Article Google Scholar
Glover, A., Dinale, A., Rosa, L. D. S., Bamford, S., & Bartolozzi, C. (2021). luvharris: A practical corner detector for event-cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 10087–10098.
Article Google Scholar
Guan, W., & Lu, P. (2022). Monocular event visual inertial odometry based on event-corner using sliding windows graph-based optimization. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2438–2445. IEEE.
Gehrig, D., Loquercio, A., Derpanis, K.G., & Scaramuzza, D. (2019). End-to-end learning of representations for asynchronous event-based data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5633–5643.
Gallego, G., Lund, J. E., Mueggler, E., Rebecq, H., Delbruck, T., & Scaramuzza, D. (2017). Event-based, 6-dof camera tracking from photometric depth maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(10), 2402–2412.
Article Google Scholar
Gehrig, D., Rebecq, H., Gallego, G., & Scaramuzza, D. (2020). Eklt: Asynchronous photometric feature tracking using events and frames. International Journal of Computer Vision, 128(3), 601–618.
Article Google Scholar
Gallego, G., & Scaramuzza, D. (2017). Accurate angular velocity estimation with an event camera. IEEE Robotics and Automation Letters, 2(2), 632–639.
Article Google Scholar
Hidalgo-Carrió, J., Gallego, G., & Scaramuzza, D. (2022). Event-aided direct sparse odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5781–5790.
Huang, Z., Sun, L., Zhao, C., Li, S., & Su, S. (2023). Eventpoint: Self-supervised interest point detection and description for event-based camera. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5396–5405.
Jiao, J., Huang, H., Li, L., He, Z., Zhu, Y., & Liu, M. (2021). Comparing representations in tracking for event camera-based slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1369–1376.
Kim, H., Benosman, I., & Davison. (2014). Simultaneous mosaicing and tracking with an event camera. In: 2014 Proceedings of the British Machine Vision Conference (BMVC), pp. 1–12.
Kim, H., Leutenegger, S., & Davison, A.J. (2016). Real-time 3d reconstruction and 6-dof tracking with an event camera. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, pp. 349–364. Springer
Kueng, B., Mueggler, E., Gallego, G., & Scaramuzza, D. (2016). Low-latency visual odometry using event-based feature tracks. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 16–23. IEEE.
Le Gentil, C., Tschopp, F., Alzugaray, I., Vidal-Calleja, T., Siegwart, R., & Nieto, J. (2020). Idol: A framework for imu-dvs odometry using lines. in 2020 ieee. In: RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5863–5870.
Lagorce, X., Orchard, G., Galluppi, F., Shi, B. E., & Benosman, R. B. (2016). Hots: A hierarchy of event-based time-surfaces for pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(7), 1346–1359.
Article Google Scholar
Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A $128\times 128$ 120 db $15\mu s$ latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2), 566–576.
Article Google Scholar
Lin, S., Xu, F., Wang, X., Yang, W., & Yu, L. (2020). Efficient spatial-temporal normalization of SAE representation for event camera. IEEE Robotics and Automation Letters, 5(3), 4265–4272.
Article Google Scholar
Mueggler, E., Bartolozzi, C., & Scaramuzza, D. (2017). Fast event-based corner detection. In: 2017 Proceedings of the British Machine Vision Conference (BMVC), pp. 1–11.
Mueggler, E., Forster, C., Baumli, N., Gallego, G., & Scaramuzza, D. (2015). Lifetime estimation of events from dynamic vision sensors. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4874–4881. IEEE
Mahlknecht, F., Gehrig, D., Nash, J., Rockenbauer, F. M., Morrell, B., Delaune, J., & Scaramuzza, D. (2022). Exploring event camera-based odometry for planetary robots. IEEE Robotics and Automation Letters, 7(4), 8651–8658.
Article Google Scholar
Mueggler, E., Huber, B., & Scaramuzza, D. (2014). Event-based, 6-dof pose tracking for high-speed maneuvers. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2761–2768. IEEE
Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., & Scaramuzza, D. (2017). The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam. The International Journal of Robotics Research, 36(2), 142–149.
Article Google Scholar
Munda, G., Reinbacher, C., & Pock, T. (2018). Real-time intensity-image reconstruction for event cameras using manifold regularisation. International Journal of Computer Vision, 126(12), 1381–1393.
Article Google Scholar
Manderscheid, J., Sironi, A., Bourdis, N., Migliore, D., & Lepetit, V. (2019). Speed invariant time surface for learning to detect corner points with event-based cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10245–10254.
Mostafavi, M., Wang, L., & Yoon, K.-J. (2021). Learning to reconstruct HDR images from events, with applications to depth and flow prediction. International Journal of Computer Vision, 129(4), 900–920.
Article Google Scholar
Rebecq, H., Gallego, G., Mueggler, E., & Scaramuzza, D. (2018). EMVS: Event-based multi-view stereo-3d reconstruction with an event camera in real-time. International Journal of Computer Vision, 126(12), 1394–1414.
Article Google Scholar
Rebecq, H., Horstschäfer, T., Gallego, G., & Scaramuzza, D. (2016). Evo: A geometric approach to event-based 6-DOF parallel tracking and mapping in real time. IEEE Robotics and Automation Letters, 2(2), 593–600.
Article Google Scholar
Rebecq, H., Horstschaefer, T., & Scaramuzza, D. (2017). Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization. In: 2017 Proceedings of the British Machine Vision Conference (BMVC).
Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, D. (2019). High speed and high dynamic range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6), 1964–1980.
Article Google Scholar
Ramesh, B., Yang, H., Orchard, G., Le Thi, N. A., Zhang, S., & Xiang, C. (2019). Dart: Distribution aware retinal transform for event-based cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(11), 2767–2780.
Google Scholar
Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., & Benosman, R. (2018). Hats: Histograms of averaged time surfaces for robust event-based object classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1731–1740.
Tome, D., Alldieck, T., Peluse, P., Pons-Moll, G., Agapito, L., Badino, H., & Torre, F. (2020). Selfpose: 3D egocentric pose estimation from a headset mounted camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 6794–6806.
Article Google Scholar
Tulyakov, S., Fleuret, F., Kiefel, M., Gehler, P., & Hirsch, M. (2019). Learning an event sequence embedding for dense event-based deep stereo. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1527–1537.
Urban, S., Wursthorn, S., Leitloff, J., & Hinz, S. (2017). Multicol bundle adjustment: A generic method for pose estimation, simultaneous self-calibration and reconstruction for arbitrary multi-camera systems. International Journal of Computer Vision, 121, 234–252.
Article Google Scholar
Vidal, A. R., Rebecq, H., Horstschaefer, T., & Scaramuzza, D. (2018). Ultimate slam? Combining events, images, and IMU for robust visual slam in HDR and high-speed scenarios. IEEE Robotics and Automation Letters, 3(2), 994–1001.
Article Google Scholar
Weikersdorfer, D., & Conradt, J. (2012). Event-based particle filtering for robot self-localization. In: 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 866–870. IEEE.
Xu, N., Wang, L., Zhao, J., & Yao, Z. (2023). Denoising for dynamic vision sensor based on augmented spatiotemporal correlation. IEEE Transactions on Circuits and Systems for Video Technology.
Yang, Y., Han, J., Liang, J., Sato, I., & Shi, B. (2023). Learning event guided high dynamic range video reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13924–13934.
Zubić, N., Gehrig, D., Gehrig, M., & Scaramuzza, D. (2023). From chaos comes order: Ordering event representations for object recognition and detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12846–12856.
Zhou, Y., Gallego, G., Rebecq, H., Kneip, L., Li, H., & Scaramuzza, D. (2018). Semi-dense 3d reconstruction with a stereo event camera. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 235–251.
Zhou, Y., Gallego, G., & Shen, S. (2021). Event-based stereo visual odometry. IEEE Transactions on Robotics, 37(5), 1433–1450.
Article Google Scholar
Zuo, Y.-F., Xu, W., Wang, X., Wang, Y., & Kneip, L. (2024). Cross-modal semi-dense 6-dof tracking of an event camera in challenging conditions. IEEE Transactions on Robotics.
Zuo, Y.-F., Yang, J., Chen, J., Wang, X., Wang, Y., & Kneip, L. (2022). Devo: Depth-event camera visual odometry in challenging conditions. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 2179–2185. IEEE.
Zhu, A.Z., Yuan, L., Chaney, K., & Daniilidis, K. (2018). Ev-flownet: Self-supervised optical flow estimation for event-based cameras. arXiv preprint arXiv:1802.06898.
Zhu, A.Z., Yuan, L., Chaney, K. & Daniilidis, K. (2019). Unsupervised event-based learning of optical flow, depth, and egomotion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 989–997.
Zihao Zhu, A., Atanasov, N., & Daniilidis, K. (2017). Event-based visual inertial odometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5391–5399.

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (2022YFD2001503), Jiangsu Provincial Key Research and Development Program (BE2022389), Program of China Scholar Council (Grant No. 202306090101).

Author information

Authors and Affiliations

Key laboratory of micro-inertial instrument and advanced navigation technology, Ministry of education, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
Ninghui Xu, Lihui Wang & Zhiting Yao
Department of System Information Sciences Intelligent, Graduate School of Information Sciences, Tohoku University, Sendai, 980-8579, Japan
Ninghui Xu & Takayuki Okatani

Authors

Ninghui Xu
View author publications
Search author on:PubMed Google Scholar
Lihui Wang
View author publications
Search author on:PubMed Google Scholar
Zhiting Yao
View author publications
Search author on:PubMed Google Scholar
Takayuki Okatani
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: NX and LW; methodology: NX; software: NX; formal analysis and investigation: LW; writing—original draft preparation: NX; writing—review and editing: LW, TO and ZY; resources: LW and ZY; supervision: LW and TO.

Corresponding author

Correspondence to Lihui Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Consent for Publication

All authors gave explicit consent to the submission of the paper.

Ethical approval

The authors have no relevant ethics approval to disclose.

Additional information

Communicated by Ming-Hsuan Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 90443 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, N., Wang, L., Yao, Z. et al. METS: Motion-Encoded Time-Surface for Event-Based High-Speed Pose Tracking. Int J Comput Vis 133, 4401–4419 (2025). https://doi.org/10.1007/s11263-025-02379-6

Download citation

Received: 31 March 2024
Accepted: 04 February 2025
Published: 05 March 2025
Version of record: 05 March 2025
Issue date: July 2025
DOI: https://doi.org/10.1007/s11263-025-02379-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

METS: Motion-Encoded Time-Surface for Event-Based High-Speed Pose Tracking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking

Event-Based Motion Magnification

System Design and Implementation of Particle Filter Algorithm Combined with Mean Shift in High-Precision Event Camera Positioning

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent for Publication

Ethical approval

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now