Abstract
We present a novel event-based representation, named Motion-Encoded Time-Surface (METS), and how it can be used to address the challenge of pose tracking under high-speed scenarios with an event camera. The core concept is dynamically encoding the pixel-wise decay rate of the Time-Surface to account for localized spatio-temporal scene dynamics captured by events, rendering remarkable adaptability with respect to motion dynamics. The consistency between METS and the scene in highly dynamic conditions establishes a reliable foundation for robust pose estimation. Building upon this, we employ a semi-dense 3D-2D alignment pipeline to fully unlock the potential of the event camera for high-speed tracking applications. Given the intrinsic characteristics of METS, we further develop specialized lightweight operations aimed at minimizing the per-event computational cost. The proposed algorithm is successfully evaluated on public datasets and our high-speed motion datasets covering various scenes and motion complexities. It shows that our approach outperforms state-of-the-art pose tracking methods, especially in highly dynamic scenarios, and is capable of tracking accurately under incredibly fast motions that are inaccessible for other event- or frame-based counterparts. Due to its simplicity, our algorithm exhibits outstanding practicality, running at over 70 Hz on a standard CPU.
Similar content being viewed by others
Data Availability
The supplementary will be released and available after acceptance.
References
Alzugaray, I., & Chli, M. (2018). Ace: An efficient asynchronous corner tracker for event cameras. In: 2018 International Conference on 3D Vision (3DV), pp. 653–661. IEEE.
Brandli, C., Berner, R., Yang, M., Liu, S.-C., & Delbruck, T. (2014). A \(240\times 180\) 130 db \(3\mu s\) latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49(10), 2333–2341.
Benosman, R., Clercq, C., Lagorce, X., Ieng, S.-H., & Bartolozzi, C. (2013). Event-based visual flow. IEEE Transactions on Neural Networks and Learning Systems, 25(2), 407–417.
Bryner, S., Gallego, G., Rebecq, H., & Scaramuzza, D. (2019). Event-based, direct camera tracking from a photometric 3d map using nonlinear optimization. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 325–331. IEEE.
Baldwin, R. W., Liu, R., Almatrafi, M., Asari, V., & Hirakawa, K. (2022). Time-ordered recent event (tore) volumes for event cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 2519–2532.
Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56, 221–255.
Cannici, M., Ciccone, M., Romanoni, A., & Matteucci, M. (2020). A differentiable recurrent surface for asynchronous event-based data. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pp. 136–152. Springer.
Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M., & Tardós, J. D. (2021). Orb-slam3: An accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Transactions on Robotics, 37(6), 1874–1890.
Casiez, G., Roussel, N., & Vogel, D. (2012). 1€filter: A simple speed-based low-pass filter for noisy input in interactive systems. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2527–2530.
Dai, B., Le Gentil, C., & Vidal-Calleja, T. (2022). A tightly-coupled event-inertial odometry using exponential decay and linear preintegrated measurements. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9475–9482. IEEE.
Forster, C., Pizzoli, M., & Scaramuzza, D. (2014). Svo: Fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE.
Gallego, G., Delbrück, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A. J., Conradt, J., & Daniilidis, K. (2020). Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1), 154–180.
Glover, A., Dinale, A., Rosa, L. D. S., Bamford, S., & Bartolozzi, C. (2021). luvharris: A practical corner detector for event-cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 10087–10098.
Guan, W., & Lu, P. (2022). Monocular event visual inertial odometry based on event-corner using sliding windows graph-based optimization. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2438–2445. IEEE.
Gehrig, D., Loquercio, A., Derpanis, K.G., & Scaramuzza, D. (2019). End-to-end learning of representations for asynchronous event-based data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5633–5643.
Gallego, G., Lund, J. E., Mueggler, E., Rebecq, H., Delbruck, T., & Scaramuzza, D. (2017). Event-based, 6-dof camera tracking from photometric depth maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(10), 2402–2412.
Gehrig, D., Rebecq, H., Gallego, G., & Scaramuzza, D. (2020). Eklt: Asynchronous photometric feature tracking using events and frames. International Journal of Computer Vision, 128(3), 601–618.
Gallego, G., & Scaramuzza, D. (2017). Accurate angular velocity estimation with an event camera. IEEE Robotics and Automation Letters, 2(2), 632–639.
Hidalgo-Carrió, J., Gallego, G., & Scaramuzza, D. (2022). Event-aided direct sparse odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5781–5790.
Huang, Z., Sun, L., Zhao, C., Li, S., & Su, S. (2023). Eventpoint: Self-supervised interest point detection and description for event-based camera. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5396–5405.
Jiao, J., Huang, H., Li, L., He, Z., Zhu, Y., & Liu, M. (2021). Comparing representations in tracking for event camera-based slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1369–1376.
Kim, H., Benosman, I., & Davison. (2014). Simultaneous mosaicing and tracking with an event camera. In: 2014 Proceedings of the British Machine Vision Conference (BMVC), pp. 1–12.
Kim, H., Leutenegger, S., & Davison, A.J. (2016). Real-time 3d reconstruction and 6-dof tracking with an event camera. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, pp. 349–364. Springer
Kueng, B., Mueggler, E., Gallego, G., & Scaramuzza, D. (2016). Low-latency visual odometry using event-based feature tracks. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 16–23. IEEE.
Le Gentil, C., Tschopp, F., Alzugaray, I., Vidal-Calleja, T., Siegwart, R., & Nieto, J. (2020). Idol: A framework for imu-dvs odometry using lines. in 2020 ieee. In: RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5863–5870.
Lagorce, X., Orchard, G., Galluppi, F., Shi, B. E., & Benosman, R. B. (2016). Hots: A hierarchy of event-based time-surfaces for pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(7), 1346–1359.
Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A \(128\times 128\) 120 db \(15\mu s\) latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2), 566–576.
Lin, S., Xu, F., Wang, X., Yang, W., & Yu, L. (2020). Efficient spatial-temporal normalization of SAE representation for event camera. IEEE Robotics and Automation Letters, 5(3), 4265–4272.
Mueggler, E., Bartolozzi, C., & Scaramuzza, D. (2017). Fast event-based corner detection. In: 2017 Proceedings of the British Machine Vision Conference (BMVC), pp. 1–11.
Mueggler, E., Forster, C., Baumli, N., Gallego, G., & Scaramuzza, D. (2015). Lifetime estimation of events from dynamic vision sensors. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4874–4881. IEEE
Mahlknecht, F., Gehrig, D., Nash, J., Rockenbauer, F. M., Morrell, B., Delaune, J., & Scaramuzza, D. (2022). Exploring event camera-based odometry for planetary robots. IEEE Robotics and Automation Letters, 7(4), 8651–8658.
Mueggler, E., Huber, B., & Scaramuzza, D. (2014). Event-based, 6-dof pose tracking for high-speed maneuvers. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2761–2768. IEEE
Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., & Scaramuzza, D. (2017). The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam. The International Journal of Robotics Research, 36(2), 142–149.
Munda, G., Reinbacher, C., & Pock, T. (2018). Real-time intensity-image reconstruction for event cameras using manifold regularisation. International Journal of Computer Vision, 126(12), 1381–1393.
Manderscheid, J., Sironi, A., Bourdis, N., Migliore, D., & Lepetit, V. (2019). Speed invariant time surface for learning to detect corner points with event-based cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10245–10254.
Mostafavi, M., Wang, L., & Yoon, K.-J. (2021). Learning to reconstruct HDR images from events, with applications to depth and flow prediction. International Journal of Computer Vision, 129(4), 900–920.
Rebecq, H., Gallego, G., Mueggler, E., & Scaramuzza, D. (2018). EMVS: Event-based multi-view stereo-3d reconstruction with an event camera in real-time. International Journal of Computer Vision, 126(12), 1394–1414.
Rebecq, H., Horstschäfer, T., Gallego, G., & Scaramuzza, D. (2016). Evo: A geometric approach to event-based 6-DOF parallel tracking and mapping in real time. IEEE Robotics and Automation Letters, 2(2), 593–600.
Rebecq, H., Horstschaefer, T., & Scaramuzza, D. (2017). Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization. In: 2017 Proceedings of the British Machine Vision Conference (BMVC).
Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, D. (2019). High speed and high dynamic range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6), 1964–1980.
Ramesh, B., Yang, H., Orchard, G., Le Thi, N. A., Zhang, S., & Xiang, C. (2019). Dart: Distribution aware retinal transform for event-based cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(11), 2767–2780.
Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., & Benosman, R. (2018). Hats: Histograms of averaged time surfaces for robust event-based object classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1731–1740.
Tome, D., Alldieck, T., Peluse, P., Pons-Moll, G., Agapito, L., Badino, H., & Torre, F. (2020). Selfpose: 3D egocentric pose estimation from a headset mounted camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 6794–6806.
Tulyakov, S., Fleuret, F., Kiefel, M., Gehler, P., & Hirsch, M. (2019). Learning an event sequence embedding for dense event-based deep stereo. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1527–1537.
Urban, S., Wursthorn, S., Leitloff, J., & Hinz, S. (2017). Multicol bundle adjustment: A generic method for pose estimation, simultaneous self-calibration and reconstruction for arbitrary multi-camera systems. International Journal of Computer Vision, 121, 234–252.
Vidal, A. R., Rebecq, H., Horstschaefer, T., & Scaramuzza, D. (2018). Ultimate slam? Combining events, images, and IMU for robust visual slam in HDR and high-speed scenarios. IEEE Robotics and Automation Letters, 3(2), 994–1001.
Weikersdorfer, D., & Conradt, J. (2012). Event-based particle filtering for robot self-localization. In: 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 866–870. IEEE.
Xu, N., Wang, L., Zhao, J., & Yao, Z. (2023). Denoising for dynamic vision sensor based on augmented spatiotemporal correlation. IEEE Transactions on Circuits and Systems for Video Technology.
Yang, Y., Han, J., Liang, J., Sato, I., & Shi, B. (2023). Learning event guided high dynamic range video reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13924–13934.
Zubić, N., Gehrig, D., Gehrig, M., & Scaramuzza, D. (2023). From chaos comes order: Ordering event representations for object recognition and detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12846–12856.
Zhou, Y., Gallego, G., Rebecq, H., Kneip, L., Li, H., & Scaramuzza, D. (2018). Semi-dense 3d reconstruction with a stereo event camera. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 235–251.
Zhou, Y., Gallego, G., & Shen, S. (2021). Event-based stereo visual odometry. IEEE Transactions on Robotics, 37(5), 1433–1450.
Zuo, Y.-F., Xu, W., Wang, X., Wang, Y., & Kneip, L. (2024). Cross-modal semi-dense 6-dof tracking of an event camera in challenging conditions. IEEE Transactions on Robotics.
Zuo, Y.-F., Yang, J., Chen, J., Wang, X., Wang, Y., & Kneip, L. (2022). Devo: Depth-event camera visual odometry in challenging conditions. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 2179–2185. IEEE.
Zhu, A.Z., Yuan, L., Chaney, K., & Daniilidis, K. (2018). Ev-flownet: Self-supervised optical flow estimation for event-based cameras. arXiv preprint arXiv:1802.06898.
Zhu, A.Z., Yuan, L., Chaney, K. & Daniilidis, K. (2019). Unsupervised event-based learning of optical flow, depth, and egomotion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 989–997.
Zihao Zhu, A., Atanasov, N., & Daniilidis, K. (2017). Event-based visual inertial odometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5391–5399.
Acknowledgements
This work was supported by National Key Research and Development Program of China (2022YFD2001503), Jiangsu Provincial Key Research and Development Program (BE2022389), Program of China Scholar Council (Grant No. 202306090101).
Author information
Authors and Affiliations
Contributions
Conceptualization: NX and LW; methodology: NX; software: NX; formal analysis and investigation: LW; writing—original draft preparation: NX; writing—review and editing: LW, TO and ZY; resources: LW and ZY; supervision: LW and TO.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Consent for Publication
All authors gave explicit consent to the submission of the paper.
Ethical approval
The authors have no relevant ethics approval to disclose.
Additional information
Communicated by Ming-Hsuan Yang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1 (mp4 90443 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, N., Wang, L., Yao, Z. et al. METS: Motion-Encoded Time-Surface for Event-Based High-Speed Pose Tracking. Int J Comput Vis 133, 4401–4419 (2025). https://doi.org/10.1007/s11263-025-02379-6
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-025-02379-6