这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

METS: Motion-Encoded Time-Surface for Event-Based High-Speed Pose Tracking

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We present a novel event-based representation, named Motion-Encoded Time-Surface (METS), and how it can be used to address the challenge of pose tracking under high-speed scenarios with an event camera. The core concept is dynamically encoding the pixel-wise decay rate of the Time-Surface to account for localized spatio-temporal scene dynamics captured by events, rendering remarkable adaptability with respect to motion dynamics. The consistency between METS and the scene in highly dynamic conditions establishes a reliable foundation for robust pose estimation. Building upon this, we employ a semi-dense 3D-2D alignment pipeline to fully unlock the potential of the event camera for high-speed tracking applications. Given the intrinsic characteristics of METS, we further develop specialized lightweight operations aimed at minimizing the per-event computational cost. The proposed algorithm is successfully evaluated on public datasets and our high-speed motion datasets covering various scenes and motion complexities. It shows that our approach outperforms state-of-the-art pose tracking methods, especially in highly dynamic scenarios, and is capable of tracking accurately under incredibly fast motions that are inaccessible for other event- or frame-based counterparts. Due to its simplicity, our algorithm exhibits outstanding practicality, running at over 70 Hz on a standard CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The supplementary will be released and available after acceptance.

References

  • Alzugaray, I., & Chli, M. (2018). Ace: An efficient asynchronous corner tracker for event cameras. In: 2018 International Conference on 3D Vision (3DV), pp. 653–661. IEEE.

  • Brandli, C., Berner, R., Yang, M., Liu, S.-C., & Delbruck, T. (2014). A \(240\times 180\) 130 db \(3\mu s\) latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49(10), 2333–2341.

    Article  Google Scholar 

  • Benosman, R., Clercq, C., Lagorce, X., Ieng, S.-H., & Bartolozzi, C. (2013). Event-based visual flow. IEEE Transactions on Neural Networks and Learning Systems, 25(2), 407–417.

    Article  Google Scholar 

  • Bryner, S., Gallego, G., Rebecq, H., & Scaramuzza, D. (2019). Event-based, direct camera tracking from a photometric 3d map using nonlinear optimization. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 325–331. IEEE.

  • Baldwin, R. W., Liu, R., Almatrafi, M., Asari, V., & Hirakawa, K. (2022). Time-ordered recent event (tore) volumes for event cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 2519–2532.

    Article  Google Scholar 

  • Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56, 221–255.

    Article  Google Scholar 

  • Cannici, M., Ciccone, M., Romanoni, A., & Matteucci, M. (2020). A differentiable recurrent surface for asynchronous event-based data. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pp. 136–152. Springer.

  • Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M., & Tardós, J. D. (2021). Orb-slam3: An accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Transactions on Robotics, 37(6), 1874–1890.

    Article  Google Scholar 

  • Casiez, G., Roussel, N., & Vogel, D. (2012). 1€filter: A simple speed-based low-pass filter for noisy input in interactive systems. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2527–2530.

  • Dai, B., Le Gentil, C., & Vidal-Calleja, T. (2022). A tightly-coupled event-inertial odometry using exponential decay and linear preintegrated measurements. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9475–9482. IEEE.

  • Forster, C., Pizzoli, M., & Scaramuzza, D. (2014). Svo: Fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE.

  • Gallego, G., Delbrück, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A. J., Conradt, J., & Daniilidis, K. (2020). Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1), 154–180.

    Article  Google Scholar 

  • Glover, A., Dinale, A., Rosa, L. D. S., Bamford, S., & Bartolozzi, C. (2021). luvharris: A practical corner detector for event-cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 10087–10098.

    Article  Google Scholar 

  • Guan, W., & Lu, P. (2022). Monocular event visual inertial odometry based on event-corner using sliding windows graph-based optimization. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2438–2445. IEEE.

  • Gehrig, D., Loquercio, A., Derpanis, K.G., & Scaramuzza, D. (2019). End-to-end learning of representations for asynchronous event-based data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5633–5643.

  • Gallego, G., Lund, J. E., Mueggler, E., Rebecq, H., Delbruck, T., & Scaramuzza, D. (2017). Event-based, 6-dof camera tracking from photometric depth maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(10), 2402–2412.

    Article  Google Scholar 

  • Gehrig, D., Rebecq, H., Gallego, G., & Scaramuzza, D. (2020). Eklt: Asynchronous photometric feature tracking using events and frames. International Journal of Computer Vision, 128(3), 601–618.

    Article  Google Scholar 

  • Gallego, G., & Scaramuzza, D. (2017). Accurate angular velocity estimation with an event camera. IEEE Robotics and Automation Letters, 2(2), 632–639.

    Article  Google Scholar 

  • Hidalgo-Carrió, J., Gallego, G., & Scaramuzza, D. (2022). Event-aided direct sparse odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5781–5790.

  • Huang, Z., Sun, L., Zhao, C., Li, S., & Su, S. (2023). Eventpoint: Self-supervised interest point detection and description for event-based camera. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5396–5405.

  • Jiao, J., Huang, H., Li, L., He, Z., Zhu, Y., & Liu, M. (2021). Comparing representations in tracking for event camera-based slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1369–1376.

  • Kim, H., Benosman, I., & Davison. (2014). Simultaneous mosaicing and tracking with an event camera. In: 2014 Proceedings of the British Machine Vision Conference (BMVC), pp. 1–12.

  • Kim, H., Leutenegger, S., & Davison, A.J. (2016). Real-time 3d reconstruction and 6-dof tracking with an event camera. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, pp. 349–364. Springer

  • Kueng, B., Mueggler, E., Gallego, G., & Scaramuzza, D. (2016). Low-latency visual odometry using event-based feature tracks. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 16–23. IEEE.

  • Le Gentil, C., Tschopp, F., Alzugaray, I., Vidal-Calleja, T., Siegwart, R., & Nieto, J. (2020). Idol: A framework for imu-dvs odometry using lines. in 2020 ieee. In: RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5863–5870.

  • Lagorce, X., Orchard, G., Galluppi, F., Shi, B. E., & Benosman, R. B. (2016). Hots: A hierarchy of event-based time-surfaces for pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(7), 1346–1359.

    Article  Google Scholar 

  • Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A \(128\times 128\) 120 db \(15\mu s\) latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2), 566–576.

    Article  Google Scholar 

  • Lin, S., Xu, F., Wang, X., Yang, W., & Yu, L. (2020). Efficient spatial-temporal normalization of SAE representation for event camera. IEEE Robotics and Automation Letters, 5(3), 4265–4272.

    Article  Google Scholar 

  • Mueggler, E., Bartolozzi, C., & Scaramuzza, D. (2017). Fast event-based corner detection. In: 2017 Proceedings of the British Machine Vision Conference (BMVC), pp. 1–11.

  • Mueggler, E., Forster, C., Baumli, N., Gallego, G., & Scaramuzza, D. (2015). Lifetime estimation of events from dynamic vision sensors. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4874–4881. IEEE

  • Mahlknecht, F., Gehrig, D., Nash, J., Rockenbauer, F. M., Morrell, B., Delaune, J., & Scaramuzza, D. (2022). Exploring event camera-based odometry for planetary robots. IEEE Robotics and Automation Letters, 7(4), 8651–8658.

    Article  Google Scholar 

  • Mueggler, E., Huber, B., & Scaramuzza, D. (2014). Event-based, 6-dof pose tracking for high-speed maneuvers. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2761–2768. IEEE

  • Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., & Scaramuzza, D. (2017). The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam. The International Journal of Robotics Research, 36(2), 142–149.

    Article  Google Scholar 

  • Munda, G., Reinbacher, C., & Pock, T. (2018). Real-time intensity-image reconstruction for event cameras using manifold regularisation. International Journal of Computer Vision, 126(12), 1381–1393.

    Article  Google Scholar 

  • Manderscheid, J., Sironi, A., Bourdis, N., Migliore, D., & Lepetit, V. (2019). Speed invariant time surface for learning to detect corner points with event-based cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10245–10254.

  • Mostafavi, M., Wang, L., & Yoon, K.-J. (2021). Learning to reconstruct HDR images from events, with applications to depth and flow prediction. International Journal of Computer Vision, 129(4), 900–920.

    Article  Google Scholar 

  • Rebecq, H., Gallego, G., Mueggler, E., & Scaramuzza, D. (2018). EMVS: Event-based multi-view stereo-3d reconstruction with an event camera in real-time. International Journal of Computer Vision, 126(12), 1394–1414.

    Article  Google Scholar 

  • Rebecq, H., Horstschäfer, T., Gallego, G., & Scaramuzza, D. (2016). Evo: A geometric approach to event-based 6-DOF parallel tracking and mapping in real time. IEEE Robotics and Automation Letters, 2(2), 593–600.

    Article  Google Scholar 

  • Rebecq, H., Horstschaefer, T., & Scaramuzza, D. (2017). Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization. In: 2017 Proceedings of the British Machine Vision Conference (BMVC).

  • Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, D. (2019). High speed and high dynamic range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6), 1964–1980.

    Article  Google Scholar 

  • Ramesh, B., Yang, H., Orchard, G., Le Thi, N. A., Zhang, S., & Xiang, C. (2019). Dart: Distribution aware retinal transform for event-based cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(11), 2767–2780.

    Google Scholar 

  • Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., & Benosman, R. (2018). Hats: Histograms of averaged time surfaces for robust event-based object classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1731–1740.

  • Tome, D., Alldieck, T., Peluse, P., Pons-Moll, G., Agapito, L., Badino, H., & Torre, F. (2020). Selfpose: 3D egocentric pose estimation from a headset mounted camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 6794–6806.

    Article  Google Scholar 

  • Tulyakov, S., Fleuret, F., Kiefel, M., Gehler, P., & Hirsch, M. (2019). Learning an event sequence embedding for dense event-based deep stereo. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1527–1537.

  • Urban, S., Wursthorn, S., Leitloff, J., & Hinz, S. (2017). Multicol bundle adjustment: A generic method for pose estimation, simultaneous self-calibration and reconstruction for arbitrary multi-camera systems. International Journal of Computer Vision, 121, 234–252.

    Article  Google Scholar 

  • Vidal, A. R., Rebecq, H., Horstschaefer, T., & Scaramuzza, D. (2018). Ultimate slam? Combining events, images, and IMU for robust visual slam in HDR and high-speed scenarios. IEEE Robotics and Automation Letters, 3(2), 994–1001.

    Article  Google Scholar 

  • Weikersdorfer, D., & Conradt, J. (2012). Event-based particle filtering for robot self-localization. In: 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 866–870. IEEE.

  • Xu, N., Wang, L., Zhao, J., & Yao, Z. (2023). Denoising for dynamic vision sensor based on augmented spatiotemporal correlation. IEEE Transactions on Circuits and Systems for Video Technology.

  • Yang, Y., Han, J., Liang, J., Sato, I., & Shi, B. (2023). Learning event guided high dynamic range video reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13924–13934.

  • Zubić, N., Gehrig, D., Gehrig, M., & Scaramuzza, D. (2023). From chaos comes order: Ordering event representations for object recognition and detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12846–12856.

  • Zhou, Y., Gallego, G., Rebecq, H., Kneip, L., Li, H., & Scaramuzza, D. (2018). Semi-dense 3d reconstruction with a stereo event camera. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 235–251.

  • Zhou, Y., Gallego, G., & Shen, S. (2021). Event-based stereo visual odometry. IEEE Transactions on Robotics, 37(5), 1433–1450.

    Article  Google Scholar 

  • Zuo, Y.-F., Xu, W., Wang, X., Wang, Y., & Kneip, L. (2024). Cross-modal semi-dense 6-dof tracking of an event camera in challenging conditions. IEEE Transactions on Robotics.

  • Zuo, Y.-F., Yang, J., Chen, J., Wang, X., Wang, Y., & Kneip, L. (2022). Devo: Depth-event camera visual odometry in challenging conditions. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 2179–2185. IEEE.

  • Zhu, A.Z., Yuan, L., Chaney, K., & Daniilidis, K. (2018). Ev-flownet: Self-supervised optical flow estimation for event-based cameras. arXiv preprint arXiv:1802.06898.

  • Zhu, A.Z., Yuan, L., Chaney, K. & Daniilidis, K. (2019). Unsupervised event-based learning of optical flow, depth, and egomotion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 989–997.

  • Zihao Zhu, A., Atanasov, N., & Daniilidis, K. (2017). Event-based visual inertial odometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5391–5399.

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (2022YFD2001503), Jiangsu Provincial Key Research and Development Program (BE2022389), Program of China Scholar Council (Grant No. 202306090101).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: NX and LW; methodology: NX; software: NX; formal analysis and investigation: LW; writing—original draft preparation: NX; writing—review and editing: LW, TO and ZY; resources: LW and ZY; supervision: LW and TO.

Corresponding author

Correspondence to Lihui Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Consent for Publication

All authors gave explicit consent to the submission of the paper.

Ethical approval

The authors have no relevant ethics approval to disclose.

Additional information

Communicated by Ming-Hsuan Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 90443 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, N., Wang, L., Yao, Z. et al. METS: Motion-Encoded Time-Surface for Event-Based High-Speed Pose Tracking. Int J Comput Vis 133, 4401–4419 (2025). https://doi.org/10.1007/s11263-025-02379-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-025-02379-6

Keywords